CN114969800A - Data mean value statistical method and device for privacy protection - Google Patents

Data mean value statistical method and device for privacy protection Download PDF

Info

Publication number
CN114969800A
CN114969800A CN202110198079.3A CN202110198079A CN114969800A CN 114969800 A CN114969800 A CN 114969800A CN 202110198079 A CN202110198079 A CN 202110198079A CN 114969800 A CN114969800 A CN 114969800A
Authority
CN
China
Prior art keywords
data
binary data
processed
bit
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110198079.3A
Other languages
Chinese (zh)
Inventor
潘宣辰
陈诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Antiy Mobile Security Co ltd
Original Assignee
Wuhan Antiy Mobile Security Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Antiy Mobile Security Co ltd filed Critical Wuhan Antiy Mobile Security Co ltd
Priority to CN202110198079.3A priority Critical patent/CN114969800A/en
Publication of CN114969800A publication Critical patent/CN114969800A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Abstract

The embodiment of the invention provides a data mean value statistical method and device for privacy protection. The method comprises the following steps: determining N probability data, wherein the N probability data are set to be different in value or partially the same in value, and the value of each probability data is in the range of 0 to 1, wherein N is equal to the length of the binary data to be processed; respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability is one of the N pieces of probability data; and sending the turned binary data to be processed to a server side so that the server side can perform mean value statistics on the binary data to be processed. The embodiment of the invention improves the usability while ensuring the privacy.

Description

Data mean value statistical method and device for privacy protection
Technical Field
The embodiment of the invention relates to the technical field of internet, in particular to a data mean value statistical method and device for privacy protection.
Background
With the advent of the big data era, data mining is more and more popular, but the data mining often exposes the privacy of users, and enterprise credit is reduced. The conventional privacy method needs special attack hypothesis and certain background knowledge, and cannot completely ensure the security of the privacy information. Therefore, in recent years, researchers have studied differential privacy characterized by strong background assumption and quantifiable privacy. In Local Differential Privacy (LDP), each user locally perturbs data before sending its data to an untrusted party. In this sense, no one else other than the data owner can access the original data. To date, LDP has been deployed by a number of major software and internet companies in many practical products: the apple proposed CMS/HCMS system has now been deployed to run on hundreds of millions of iOS and macOS devices, performing various mobile data collection tasks such as identifying popular emoticons, popular health data types in Safari. The RAPPOR proposed by Google is deployed in Google Chrome, and can collect user behavior data for statistical analysis under the condition of Chrome use and protect privacy at the same time. Microsoft has also deployed LDP to collect telemetry data for mean and histogram estimation over time.
Currently, the main research directions of the localized differential privacy technology are as follows: disturbance mechanism research, single-value frequency distribution, multi-value frequency distribution and mean value distribution. The perturbation mechanism mainly comprises one of random response, information compression and distortion. The disturbance framework of the random response technology is concise and intuitive, and the disturbance degree of the random response technology can be directly quantized, so that most of research work under the localized differential privacy is based on the random response technology, including frequency distribution aiming at discrete data and mean distribution aiming at continuous data. The Mean value distribution research aiming at continuous data is relatively few, the Mean est and the modified version Harmony-Mean are classic, and the main idea of the Mean est is to discretize numerical data into two values. But the numerical data is mapped into two values with a certain probability, deviating too much from the original values. When the average value of continuous data is counted by the RAPPOR method, the average value is firstly mapped into binary numbers with equal length by using a hash function, and then each bit is inverted with the same probability. But because the result of the hash function mapping is 256 bits, the communication cost is large.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a data mean value statistical method and device for privacy protection, wherein when binary system is disturbed, each bit of the binary system is not inverted with the same probability, and the inverted probability is related to the bit number of the binary system, so that the usability is improved while the privacy is ensured.
In a first aspect, an embodiment of the present invention provides a data mean statistical method for privacy protection, which is applied to a client, and includes:
determining N probability data, wherein the N probability data are set to be different in value or partially the same in value, and the value of each probability data is in the range of 0 to 1, wherein N is equal to the length of the binary data to be processed;
respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability is one of the N pieces of probability data;
and sending the turned binary data to be processed to a server side so that the server side can perform mean value statistics on the binary data to be processed.
Further, the turning over each bit of data of the binary data to be processed with a preset probability respectively specifically includes:
and respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability corresponding to each bit of data from a high bit to a low bit of the binary data to be processed is increased from small to large.
Further, the turning over each bit of the binary data to be processed with a preset probability respectively includes
And respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability of one or more specific bits of the binary data to be processed corresponds to one or more specific probabilities, and the preset probabilities corresponding to the rest bits of the binary data to be processed and each bit of data from high bits to low bits are sequentially increased from small to large.
Further, the binary data to be processed is obtained by the following method:
determining a common factor, wherein the common factor is a positive integer;
dividing the numerical data by a common factor to obtain an integer quotient;
and converting the integer quotient into binary data, wherein the bit number of the binary data is determined according to the number of the numerical data.
In a second aspect, an embodiment of the present invention provides a data mean statistical method for privacy protection, applied to a server, including:
receiving binary data sent by a client, wherein the binary data is obtained by turning each bit of data of the binary data to be processed by the client with a preset probability, the preset probability is one of N different or partially different probability data, and the value of each probability data is within a range from 0 to 1, wherein N is equal to the length of the binary data to be processed;
and counting the frequency of 1 appearing on the target data bit of all the binary data to be processed based on the target data bit, wherein the frequency is used as the average value of the target data bits of the binary data.
Further, the binary data is obtained by the client turning over each bit of data of the binary data to be processed with a preset probability, and specifically includes:
the binary data is obtained by the client terminal turning each bit of data of the binary data to be processed with a preset probability, and the preset probabilities corresponding to each bit of data from high bits to low bits of the binary data to be processed increase from small to large in sequence.
Further, the binary data is obtained by the client turning over each bit of data of the binary data to be processed with a preset probability, and specifically includes:
the binary data is obtained by the client terminal turning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability of one or more specific bits of the binary data to be processed corresponds to one or more specific probabilities, and the preset probabilities corresponding to the remaining bits of the binary data to be processed and each bit of data from high bits to low bits are sequentially increased from small to large.
Further, the binary data to be processed is obtained by the following method:
determining a common factor, wherein the common factor is a positive integer;
dividing the numerical data by a common factor to obtain an integer quotient;
and converting the integer quotient into binary data, wherein the bit number of the binary data is determined according to the number of the numerical data.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, and the processor calls the program instructions to perform the method for data mean statistics for privacy protection according to the first aspect of the present invention and any optional embodiment thereof, or to perform the method for data mean statistics for privacy protection according to the second aspect of the present invention and any optional embodiment thereof.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions for executing the method for data mean statistics for privacy protection according to the first aspect of the present invention and any optional embodiment thereof, or executing the method for data mean statistics for privacy protection according to the second aspect of the present invention and any optional embodiment thereof.
According to the data mean value statistical method for privacy protection provided by the embodiment of the invention, when the binary system is disturbed, not every bit of the binary system is overturned with the same probability, but every bit of the binary data is overturned with the preset probability, all the preset probabilities are different or part of the same data are different, and the binary data overturned with the preset probability is sent to the server, so that the usability is improved while the privacy is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating a client-side flow of a data mean statistical method for privacy protection according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a server-side flow of a data mean statistical method for privacy protection according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a data inversion process of a data mean statistical method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a client device according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a server apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a frame of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a client side of a data mean statistical method for privacy protection according to an embodiment of the present invention. The data mean statistical method for privacy protection, as described in fig. 1, applied to a client, includes:
determining N probability data, wherein the N probability data are set to be different in value or partially the same in value, and the value of each probability data is in the range of 0 to 1, wherein N is equal to the length of binary data to be processed;
for example, if the length of the binary data to be processed is 5, 5 probability data are determined, which are assumed to be 0.2, 0.6, 0.25, 0.78, and 0.52, or 0.2, 0.6, 0.25, and 0.6, and these probability data may all be different, or some of the probability data may be the same, and at least one of the probability data may be different.
101, respectively turning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability is one of the N probability data;
and 102, sending the turned binary data to be processed to a server side, so that the server side can perform mean value statistics on the binary data to be processed.
As shown in the above example, 5-bit binary data is inverted according to a probability for each bit, and the inverted data is sent to the server, and the server only needs to count the frequency of each bit binary, that is, the number of times of occurrence of 1.
The embodiment of the invention is suitable for processing discrete numerical data and is also suitable for processing continuous numerical data. For the continuous numerical data, in order to reduce the communication cost, the embodiment processes the continuous numerical data into binary data by an "approximation" method, which specifically includes:
determining a common factor, wherein the common factor is a positive integer; dividing the numerical data by a common factor, and rounding to obtain an integer quotient; and converting the integer quotient into binary data, wherein the bit number of the binary data is determined according to the number of the numerical data.
According to the data mean value statistical method for privacy protection provided by the embodiment of the invention, for the mean value statistics of numerical data, firstly, the numerical data is 'approximate' and then divided by a common factor, and then the numerical data is converted into a binary system, when the binary system is disturbed, each bit of the binary system is not overturned with the same probability, but the probability of random overturning is determined according to the bit number of the binary system, all overturning probabilities can be completely different or partially same and partially different, and the overturned binary data is sent to a server side, so that the usability is improved while the privacy is ensured.
Specifically, in the following scenarios, for example, statistics of a memory, a use duration of a mobile terminal application, an installed number of mobile phone applications, access times of different applications, a battery capacity, and the like are performed; to assess the popularity of botnets or hijacked clients, an operator may wish to monitor how many clients 'priorities have been covered over the past 24 hours, how many users' Web searches were redirected to known URLs by malicious search providers, etc., which may be applicable to the methods described in embodiments of the present invention. The invention also provides two other embodiments.
Based on the above embodiment, the step 101 of respectively flipping each bit of the binary data to be processed with a preset probability specifically includes:
and respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability corresponding to each bit of data from a high bit to a low bit of the binary data to be processed is increased from small to large.
In the embodiment, binary data is inverted, the higher the bit number is, the smaller the probability of inversion is, the lower the bit number is, the higher the probability of inversion is, and the probability of inversion from high bits to low bits is increased in sequence.
For example, the length of the binary data to be processed is 5, and it is determined that 5 pieces of probability data are 0.2, 0.6, 0.25, 0.78, and 0.52, the flip probabilities corresponding to the binary data from the upper bit to the lower bit are 0.2, 0.25, 0.52, 0.6, and 0.78 in order, that is, the flip probability of the highest bit is 0.2, and the flip probability of the lowest bit is 0.78.
Based on the above embodiment, the step 101 of respectively flipping each bit of the binary data to be processed with a preset probability specifically includes
And respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability of one or more specific bits of the binary data to be processed corresponds to one or more specific probabilities, and the preset probabilities corresponding to the rest bits of the binary data to be processed and each bit of data from high bits to low bits are sequentially increased from small to large.
In the embodiment, binary data is inverted, the higher the bit number is, the smaller the probability of inversion is, the lower the bit number is, the higher the probability of inversion is, and the probability of inversion from the high bit to the low bit is sequentially increased.
For example, the length of the binary data to be processed is 10, the corresponding flip probabilities of the binary data from the high bit to the low bit are sequentially increased, the 2 nd bit is a specific bit, and a specific probability of 0.74 is used, for example, all probabilities are: 0.1, 0.74, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, i.e. the highest bit flip probability is 0.1, the 2 nd bit flip probability is 0.74, and the lowest bit flip probability is 0.9.
The embodiment of the invention provides an LDP method for disturbing numerical data at a client, which realizes a disturbance mechanism through the client so that the data meets the differential privacy, and meanwhile, a server can calculate the mean value information from noisy data received by the client, so that the LDP method has high statistical usability.
One specific embodiment comprises the following steps:
(1) and preprocessing the client data. For numerical data, such as the age of a user or the use duration of mobile phone application, in order to reduce communication cost, continuous numerical data is converted into binary processing; the principle is to "approximate" within the scope of availability, i.e. to use a binary representation with a smaller number of bits. If the usage duration of the mobile phone is in minutes, "approximately" is a multiple of 5, the usage duration set may be as follows: {0, 15, 25, 40, 50 … }; the age information of a user is "approximated" to be a multiple of 3, so that the set of user ages may be as follows: {0, 3, 6, 12, 18 … }.
(2) Converted to binary. The approximated data is divided by a common factor, which is 5 or 3 as in the above example, and then converted into a binary system, and the number of bits of the binary system is determined according to the number of the candidate values of the variables.
(3) And randomly turning each bit of binary data at the client. The method of using a random response is flipped with different or partially different probabilities for each bit of the binary. There are three strategies for flipping: firstly, randomly determining the turnover probability of each bit, wherein at least one of all the turnover probabilities is different; secondly, the high order is randomly overturned with lower probability and the low order is overturned with higher probability by combining the high-low decision of the secondary digit-making number; thirdly, on the basis of the strategy two, one or more specific bits can be set, and specific probabilities are used for the specific bits.
(4) And transmitting each disturbed binary string to the server.
(5) And (5) server side statistics. After receiving the binary number of each user, the server side counts the frequency (times) of 1 appearing on the same bit of each user, and calculates the mean value of the data of the type according to the value represented by each bit on the binary position after counting. For example, the average value of the user ages of a certain application is counted, and the application can be improved according to the characteristics of the user ages so as to be more suitable for the use preference of the age.
Fig. 2 is a schematic diagram of a server-side flow of a data mean statistical method for privacy protection according to an embodiment of the present invention. The data mean value statistical method for privacy protection shown in fig. 2 is applied to a server, and includes:
200, receiving binary data sent by a client, wherein the binary data is obtained by the client turning each bit of data of the binary data to be processed with a preset probability, the preset probability is one of N different or partially different probability data, and the value of each probability data is within a range from 0 to 1, wherein N is equal to the length of the binary data to be processed;
and 201, counting the frequency of 1 appearing on the target data bit of all the binary data to be processed based on the target data bit, and taking the frequency as the average value of the target data bit of the binary data.
The embodiment is a process that a server corresponds to a client, the server receives binary data sent by the client, and the client overturns each bit of data with different or partially different probabilities; for any bit of the binary data, i.e. the target data bit, the server counts the number of times that 1 appears on the target data bit, i.e. the average value of the target data bit. Accordingly, the binary data can be flipped by the client in the following two ways.
Based on the above embodiment, the binary data is obtained by the client turning over each bit of data of the binary data to be processed with a preset probability, and specifically includes:
the binary data is obtained by the client terminal turning each bit of data of the binary data to be processed with a preset probability, and the preset probabilities corresponding to each bit of data from high bits to low bits of the binary data to be processed increase from small to large in sequence.
Based on the above embodiment, the binary data is obtained by the client turning over each bit of data of the binary data to be processed with a preset probability, and specifically includes:
the binary data is obtained by the client terminal turning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability of one or more specific bits of the binary data to be processed corresponds to one or more specific probabilities, and the preset probabilities corresponding to the remaining bits of the binary data to be processed and each bit of data from high bits to low bits are sequentially increased from small to large.
For the client to flip binary data and process discrete and continuous data, please refer to the description of the client method, which is not described herein again.
The embodiment of the present invention is further described with reference to fig. 3, and fig. 3 is a schematic diagram of a data inversion process of a data mean statistical method according to the embodiment of the present invention.
(1) And preprocessing the client data. The numerical data is exemplified by the usage duration of a certain application, since the daily usage duration of the application is 24 hours at most, 1440 minutes, and for simplicity, the daily usage duration of the application may be bit approximated to 5 or 0. For example, approximately 25 minutes for 24 minutes, approximately 20 minutes for 22 minutes, and approximately 23 minutes for 23 minutes
Figure BDA0002946750600000091
The probability is approximately 20 minutes, so that
Figure BDA0002946750600000092
The probability is approximately 25 minutes. Therefore, the service time of the user is a multiple of 5, a common factor of 5 is selected, and the service time is divided by 5, so that the value range of the service time can be limited to 0-288;and converting into binary, and expressing by 9 bits. As shown in fig. 3, assuming that the usage time of a certain application by users 1, 2 and 3 is 46 minutes, 607 minutes and 129 minutes respectively, the bit approximation is 45, 605 and 130, and the division by the common factor of 5 results in 9, 121 and 26 respectively.
(2) The client data is binarized to obtain 000001001, 001111001, and 010011010, respectively.
(3) And (5) randomly disturbing by the client. Through the above processing, the numerical mean statistics are converted into frequency statistics for each bit of the binary system. For the highest position with P 1 Is kept constant at 1-P 1 The probability of (2) is turned over; similarly, for the next highest position, P 2 The probability of (1-P) remains unchanged 2 The probability of (3) is reversed, and so on; obtaining the data after turning, which are respectively: 101001000, 001100101, and 010100010.
(4) And transmitting the binary string disturbed by each user to the server.
(5) And the server side counts the data mean value. The server only needs to count the frequency of "1" on each binary bit, for example, the binary bit is from high to low, the frequency of "1" is: 1. 1, 2, 0, 1 and 1. After the statistics is completed, each bit is subjected to unbiased correction, and a method used for unbiased correction, namely a method C of binary random response is C1/(1-2P) i ) In which P is i Is the probability of flipping. Then, the average value of the application duration is calculated according to the value represented by each bit on the binary position, and the average value is converted into a decimal system: 5*(1*2 0 +1*2 1 +2*2 2 +1*2 3 +…)。
In summary, in the embodiment of the present invention, for the mean value statistics of the numerical data, the numerical data is "approximated" and then divided by a common factor, and then converted into a binary system, and the probability of random inversion is determined according to the number of bits of the binary system. The flip probability of each bit of the binary system is different or partially different, or the probability that the high bit of the binary system is flipped is small, and the probability that the low bit is flipped is large, or specific probability is set for specific bit, etc.; after the client randomly perturbs based on the above mode, the client sends data to the server, and the server only needs to count the frequency of each binary system when counting. The embodiment of the invention can ensure that the client randomly perturbs the data to be uploaded to the server, so that the client meets the differential privacy, and meanwhile, the server is ensured to have high statistical usability.
Fig. 4 is a schematic diagram of a client apparatus according to an embodiment of the present invention, and as shown in fig. 4, an embodiment of the present invention provides a client, which includes a probability module 400, a flipping module 401, and a sending module 402;
the probability module 400 is configured to determine N probability data, where the N probability data are set to have different values or partially the same value, and the value of each probability data is in a range from 0 to 1, where N is equal to the length of the binary data to be processed;
the flipping module 401 is configured to flip each bit of data of the binary data to be processed with a preset probability, where the preset probability is one of the N probability data;
the sending module 402 is configured to send the flipped binary data to be processed to a server, so that the server performs mean value statistics on the binary data to be processed.
The client according to the embodiment of the present invention is used for executing the technical solution of the data mean statistical method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 5 is a schematic diagram of a server apparatus according to an embodiment of the present invention, and as shown in fig. 5, a server according to an embodiment of the present invention includes a receiving module 500 and a counting module 501;
the receiving module 500 is configured to receive binary data sent by a client, where the binary data is obtained by the client flipping each bit of data of the binary data to be processed with a preset probability, the preset probability is one of N different or partially different probability data, and a value of each probability data is within a range from 0 to 1, where N is equal to a length of the binary data to be processed;
the counting module 501 is configured to count, based on a target data bit, a frequency of 1 appearing on the target data bit of all binary data to be processed, where the frequency is used as an average value of the target data bit of the binary data.
The server side according to the embodiment of the present invention is configured to execute the technical solution of the data mean statistical method embodiment shown in fig. 2, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 6 is a schematic diagram of an electronic device framework according to an embodiment of the invention. Referring to fig. 6, an embodiment of the invention provides an electronic device, including: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 complete communication with each other through the bus 640. The processor 610 may call logic instructions in the memory 630 to perform methods comprising: determining N probability data, wherein the N probability data are set to be different in value or partially the same in value, and the value of each probability data is in the range of 0 to 1, wherein N is equal to the length of the binary data to be processed; respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability is one of the N pieces of probability data; and sending the turned binary data to be processed to a server side so that the server side can perform mean value statistics on the binary data to be processed. Or comprises the following steps: receiving binary data sent by a client, wherein the binary data is obtained by turning each bit of data of the binary data to be processed by the client with a preset probability, the preset probability is one of N different or partially different probability data, and the value of each probability data is within a range from 0 to 1, wherein N is equal to the length of the binary data to be processed; and counting the frequency of 1 appearing on the target data bit of all the binary data to be processed based on the target data bit, wherein the frequency is used as the average value of the target data bits of the binary data. .
An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes: determining N probability data, wherein the N probability data are set to be different in value or partially the same in value, and the value of each probability data is in the range of 0 to 1, wherein N is equal to the length of the binary data to be processed; respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability is one of the N pieces of probability data; and sending the turned binary data to be processed to a server side so that the server side can perform mean value statistics on the binary data to be processed. Or comprises the following steps: receiving binary data sent by a client, wherein the binary data is obtained by turning each bit of data of the binary data to be processed by the client with a preset probability, the preset probability is one of N different or partially different probability data, and the value of each probability data is within a range from 0 to 1, wherein N is equal to the length of the binary data to be processed; and counting the frequency of 1 appearing on the target data bit of all the binary data to be processed based on the target data bit, wherein the frequency is used as the average value of the target data bits of the binary data. .
Embodiments of the present invention provide a non-transitory computer-readable storage medium, which stores computer instructions, where the computer instructions cause the computer to perform the methods provided by the above method embodiments, for example, the methods include: determining N probability data, wherein the N probability data are set to be different in value or partially the same in value, and the value of each probability data is in the range of 0 to 1, wherein N is equal to the length of the binary data to be processed; respectively turning over each bit of data of the binary data to be processed with a preset probability, wherein the preset probability is one of the N pieces of probability data; and sending the turned binary data to be processed to a server side so that the server side can perform mean value statistics on the binary data to be processed. Or comprises the following steps: receiving binary data sent by a client, wherein the binary data is obtained by turning each bit of data of the binary data to be processed by the client with a preset probability, the preset probability is one of N different or partially different probability data, and the value of each probability data is within a range from 0 to 1, wherein N is equal to the length of the binary data to be processed; and counting the frequency of 1 appearing on the target data bit of all the binary data to be processed based on the target data bit, wherein the frequency is used as the average value of the target data bits of the binary data.
Those of ordinary skill in the art will understand that: the implementation of the above-described apparatus embodiments or method embodiments is merely illustrative, wherein the processor and the memory may or may not be physically separate components, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a usb disk, a removable hard disk, a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A data mean value statistical method for privacy protection is applied to a client, and is characterized by comprising the following steps:
determining N probability data, wherein the N probability data are set to be different in value or partially the same in value, and the value of each probability data is in the range of 0 to 1, wherein N is equal to the length of the binary data to be processed;
respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability is one of the N pieces of probability data;
and sending the turned binary data to be processed to a server side so that the server side can perform mean value statistics on the binary data to be processed.
2. The method according to claim 1, wherein the flipping each bit of the binary data to be processed with a predetermined probability comprises:
and respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability corresponding to each bit of data from a high bit to a low bit of the binary data to be processed is increased from small to large.
3. The method according to claim 1, wherein said separately inverting each bit of the binary data to be processed with a predetermined probability comprises
And respectively overturning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability of one or more specific bits of the binary data to be processed corresponds to one or more specific probabilities, and the preset probabilities corresponding to the rest bits of the binary data to be processed and each bit of data from high bits to low bits are sequentially increased from small to large.
4. A method according to any one of claims 1 to 3, characterized in that said binary data to be processed are obtained by:
determining a common factor, wherein the common factor is a positive integer;
dividing the numerical data by a common factor to obtain an integer quotient;
and converting the integer quotient into binary data, wherein the bit number of the binary data is determined according to the number of the numerical data.
5. A data mean value statistical method for privacy protection is applied to a server side, and is characterized by comprising the following steps:
receiving binary data sent by a client, wherein the binary data is obtained by turning each bit of data of the binary data to be processed by the client with a preset probability, the preset probability is one of N different or partially different probability data, and the value of each probability data is within a range from 0 to 1, wherein N is equal to the length of the binary data to be processed;
and counting the frequency of 1 appearing on the target data bit of all the binary data to be processed based on the target data bit, wherein the frequency is used as the average value of the target data bits of the binary data.
6. The method according to claim 6, wherein the binary data is obtained by the client turning each bit of data of the binary data to be processed with a preset probability, and specifically comprises:
the binary data is obtained by the client terminal turning each bit of data of the binary data to be processed with a preset probability, and the preset probabilities corresponding to each bit of data from high bits to low bits of the binary data to be processed increase from small to large in sequence.
7. The method according to claim 6, wherein the binary data is obtained by the client turning each bit of data of the binary data to be processed with a preset probability, and specifically comprises:
the binary data is obtained by the client terminal turning each bit of data of the binary data to be processed with a preset probability, wherein the preset probability of one or more specific bits of the binary data to be processed corresponds to one or more specific probabilities, and the preset probabilities corresponding to the remaining bits of the binary data to be processed and each bit of data from high bits to low bits are sequentially increased from small to large.
8. The method according to any of claims 5 to 7, wherein the binary data to be processed is obtained by:
determining a common factor, wherein the common factor is a positive integer;
dividing the numerical data by a common factor to obtain an integer quotient;
and converting the integer quotient into binary data, wherein the bit number of the binary data is determined according to the number of the numerical data.
9. An electronic device, comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor being capable of invoking the program instructions to perform the method of any of claims 1 to 8.
10. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 8.
CN202110198079.3A 2021-02-22 2021-02-22 Data mean value statistical method and device for privacy protection Pending CN114969800A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110198079.3A CN114969800A (en) 2021-02-22 2021-02-22 Data mean value statistical method and device for privacy protection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110198079.3A CN114969800A (en) 2021-02-22 2021-02-22 Data mean value statistical method and device for privacy protection

Publications (1)

Publication Number Publication Date
CN114969800A true CN114969800A (en) 2022-08-30

Family

ID=82954479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110198079.3A Pending CN114969800A (en) 2021-02-22 2021-02-22 Data mean value statistical method and device for privacy protection

Country Status (1)

Country Link
CN (1) CN114969800A (en)

Similar Documents

Publication Publication Date Title
US10516638B2 (en) Techniques to select and prioritize application of junk email filtering rules
US10547618B2 (en) Method and apparatus for setting access privilege, server and storage medium
CN108667770B (en) Website vulnerability testing method, server and system
US11368478B2 (en) System for detecting and preventing malware execution in a target system
CN107240029B (en) Data processing method and device
EP3352121A1 (en) Content delivery method and device
US20130339456A1 (en) Techniques to filter electronic mail based on language and country of origin
US10419525B2 (en) Server-based system, method, and computer program product for scanning data on a client using only a subset of the data
US11048680B2 (en) Hive table scanning method, device, computer apparatus and storage medium
US20170155712A1 (en) Method and device for updating cache data
CN111090621A (en) Log obtaining method, device and storage medium
CN110196805B (en) Data processing method, data processing apparatus, storage medium, and electronic apparatus
CN114969800A (en) Data mean value statistical method and device for privacy protection
CN112182520B (en) Identification method and device of illegal account number, readable medium and electronic equipment
CN107124353B (en) Message processing method and device, computer device and storage medium
EP3694177B1 (en) System for detecting and preventing malware execution in a target system
CN110557324B (en) Unread IM message processing method and device
CN114090927A (en) Page loading method and device, computer equipment and storage medium
CN114239963A (en) Method and device for detecting directed graph circulation path
CN109962920B (en) Method, device and system for determining split page number
US9235647B1 (en) Systems and methods for predictive responses to internet object queries
CN109729185B (en) Short message port number identification method and device based on website
CN114520741B (en) Information pushing method, related equipment and system
CN112468471B (en) Remote authorization method and device
CN110223109B (en) Online shopping method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination