CN112995076A - Discrete data frequency estimation method, user side, data center and system - Google Patents

Discrete data frequency estimation method, user side, data center and system Download PDF

Info

Publication number
CN112995076A
CN112995076A CN201911298496.4A CN201911298496A CN112995076A CN 112995076 A CN112995076 A CN 112995076A CN 201911298496 A CN201911298496 A CN 201911298496A CN 112995076 A CN112995076 A CN 112995076A
Authority
CN
China
Prior art keywords
discrete data
data
codes
discrete
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911298496.4A
Other languages
Chinese (zh)
Other versions
CN112995076B (en
Inventor
刘莹
朱洪斌
刘圣龙
赵涛
王衡
周鑫
王迪
毛一凡
崔硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Big Data Center Of State Grid Corp Of China
Original Assignee
Big Data Center Of State Grid Corp Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Center Of State Grid Corp Of China filed Critical Big Data Center Of State Grid Corp Of China
Priority to CN201911298496.4A priority Critical patent/CN112995076B/en
Publication of CN112995076A publication Critical patent/CN112995076A/en
Application granted granted Critical
Publication of CN112995076B publication Critical patent/CN112995076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L25/00Baseband systems
    • H04L25/02Details ; arrangements for supplying electrical power along data transmission lines
    • H04L25/08Modifications for reducing interference; Modifications for reducing effects due to line faults ; Receiver end arrangements for detecting or overcoming line faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a discrete data frequency estimation method, a user side, a data center and a system, comprising the following steps: the user side generates discrete data codes according to the types of the discrete data sent to the data center; the method comprises the steps that a user side obtains a disturbing code corresponding to a discrete data code and sends the disturbing code corresponding to the discrete data code to a data center; the data center receives the disturbing codes corresponding to the discrete data codes of the user sides; and the data center determines the occurrence frequency of various discrete data according to the disturbing codes corresponding to the discrete data codes of the user terminals. According to the scheme, the user terminal reduces the noise injection on the original data according to the definition of loose local differential privacy, reduces the distortion degree of the data as much as possible on the basis of meeting the local differential privacy, improves the usability of the disturbed data, and further improves the accuracy of the statistical result.

Description

Discrete data frequency estimation method, user side, data center and system
Technical Field
The invention relates to the field of power grid information control, in particular to a discrete data frequency estimation method, a user side, a data center and a system.
Background
In the field of production control, including but not limited to the field of power grid information control, it is often necessary to collect service data of different areas and different departments to a data center, and through joint analysis, the occurrence frequency of a certain service event is obtained, and service analysis is performed. The case of separating data ownership and data use right is involved, namely, all data of the data are respectively in different areas and different departments, and the analysis result can be shared, so that the joint data analysis needs to be carried out under the condition of ensuring the data secret of each part.
At present, business data of the same region and different departments are directly collected to a data center, sensitive data leakage risks exist, the data center serves as a key node for joint work of all parties, and data safety protection responsibility is huge. In addition, in order to maintain data security and avoid data security responsibility, the enthusiasm of each party for sharing data is greatly reduced, which is not beneficial to the development of data service. Therefore, a technology for performing local differential privacy processing by independent parties according to the free data condition and performing joint analysis under the condition of protecting the data privacy of the independent parties is urgently needed.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to reduce the injection of noise on the original data by the user terminal according to the definition of loose local differential privacy, reduce the distortion degree of the data as much as possible on the basis of meeting the local differential privacy, improve the usability of the disturbed data and further improve the accuracy of the statistical result.
The purpose of the invention is realized by adopting the following technical scheme:
the invention provides a discrete data frequency estimation method, which is applied to a user terminal, and the improvement is that the method comprises the following steps:
generating discrete data codes according to the types of the discrete data sent to the data center;
and acquiring a scrambling code corresponding to the discrete data code, and sending the scrambling code corresponding to the discrete data code to a data center.
Preferably, the length of the discrete data codes is equal to the total number of discrete data types.
Further, the discrete data is encoded as (v)1...vi...vn) Where n is the total number of discrete data types, viIs the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v isi1, otherwise, vi=0。
Preferably, the obtaining of the scrambling code corresponding to the discrete data code includes:
acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
and determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
Further, the obtaining of the conversion probability of the code value corresponding to each type of discrete data in the discrete data coding includes:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure BDA0002321239460000021
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure BDA0002321239460000022
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure BDA0002321239460000023
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure BDA0002321239460000024
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure BDA0002321239460000025
and converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 1.
Further, the determining, based on the transition probabilities of the code values corresponding to various types of discrete data in the discrete data codes, a scrambling code corresponding to the discrete data codes includes:
in the {0,1} set
Figure BDA0002321239460000026
Extract 0 to
Figure BDA0002321239460000027
Probability of (1) is extracted, and if 0 is extracted, then
Figure BDA0002321239460000028
If 1 is drawn, then
Figure BDA0002321239460000029
The invention provides a user terminal applied to discrete data frequency estimation, and the improvement is that the user terminal comprises:
the generating module is used for generating discrete data codes according to the types of the discrete data sent to the data center;
the acquisition module is used for acquiring a scrambling code corresponding to the discrete data code;
and the sending module is used for sending the scrambling codes corresponding to the discrete data codes to the data center.
Preferably, the length of the discrete data codes is equal to the total number of discrete data types.
Further, the discrete data is encoded as (v)1...vi...vn) Where n is the total number of discrete data types, viIs the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v isi1, otherwise, vi=0。
Preferably, the obtaining module includes:
the acquisition unit is used for acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
and the determining unit is used for determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
Further, the obtaining unit is specifically configured to:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure BDA0002321239460000031
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure BDA0002321239460000032
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure BDA0002321239460000033
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure BDA0002321239460000034
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure BDA0002321239460000035
and converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 1.
Further, the determining unit is specifically configured to:
in the {0,1} set
Figure BDA0002321239460000036
Extract 0 to
Figure BDA0002321239460000037
Probability of (1) is extracted, and if 0 is extracted, then
Figure BDA0002321239460000038
If 1 is drawn, then
Figure BDA0002321239460000039
The invention provides a discrete data frequency estimation method, which is applied to a data center, and the improvement is that the method comprises the following steps:
receiving a scrambling code corresponding to the discrete data code of each user side;
and determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides.
Preferably, the determining the occurrence frequency of each type of discrete data according to the scrambling code corresponding to the discrete data code of each user side includes:
counting the i-th discrete codes in the disturbed codes corresponding to the discrete data codes of each user terminalFrequency with scrambling code value of 0 corresponding to data
Figure BDA0002321239460000041
And a frequency with scrambling code value 1
Figure BDA0002321239460000042
Based on
Figure BDA0002321239460000043
And
Figure BDA0002321239460000044
establishing an i-th discrete data generation frequency equation set;
and solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data.
Further, the system of the generation frequency equation of the ith type of discrete data is as follows:
Figure BDA0002321239460000045
in the above formula, f0(i) For no occurrence frequency of i-th type discrete data, f1(i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
The present invention provides a data center for use in discrete data frequency estimation, the improvement wherein the data center comprises:
the receiving module is used for receiving the scrambling codes corresponding to the discrete data codes of the user sides;
and the determining module is used for determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides.
Preferably, the determining module includes:
a statistic unit for counting the disturbance corresponding to the i-th discrete data in the disturbance codes corresponding to the discrete data codes of each user terminalFrequency with 0 scrambling code value
Figure BDA0002321239460000046
And a frequency with scrambling code value 1
Figure BDA0002321239460000047
A building unit for building based on
Figure BDA0002321239460000048
And
Figure BDA0002321239460000049
establishing an i-th discrete data generation frequency equation set;
and the solving unit is used for solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data.
Further, the system of the generation frequency equation of the ith type of discrete data is as follows:
Figure BDA00023212394600000410
in the above formula, f0(i) For no occurrence frequency of i-th type discrete data, f1(i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
The invention provides a method for estimating discrete data frequency, the improvement is that the method comprises the following steps:
the user side generates discrete data codes according to the types of the discrete data sent to the data center;
the method comprises the steps that a user side obtains a disturbing code corresponding to a discrete data code and sends the disturbing code corresponding to the discrete data code to a data center;
the data center receives the disturbing codes corresponding to the discrete data codes of the user sides;
and the data center determines the occurrence frequency of various discrete data according to the disturbing codes corresponding to the discrete data codes of the user terminals.
The present invention provides a discrete data frequency estimation system, the improvement wherein said system comprises: the user side and the data center.
Compared with the closest prior art, the invention has the following beneficial effects:
in the technical scheme provided by the invention, a user terminal generates discrete data codes according to the types of discrete data sent to a data center, randomly scrambles code values corresponding to various types of discrete data in the discrete data codes, and sends the scrambled codes to a data collection center; the data processed by the scheme meets the privacy requirement, and the risk of privacy disclosure is avoided.
After the data collection center receives the disturbing codes corresponding to the discrete data codes of the user sides, the occurrence frequency of various discrete data is determined according to the disturbing codes corresponding to the discrete data codes of the user sides.
Drawings
FIG. 1 is a flow chart of a method for estimating a frequency of discrete data according to the present invention;
fig. 2 is a schematic diagram of a ue structure applied to a discrete data frequency estimation method according to the present invention;
FIG. 3 is a schematic diagram of a data center structure applied to a discrete data frequency estimation method provided by the present invention;
fig. 4 is a schematic structural diagram of a discrete data frequency estimation system provided by the present invention.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to carry out joint data analysis under the condition of ensuring data secrets of all parties, the discrete data frequency estimation method provided by the invention introduces the definition of loose local differential privacy on the basis of the existing scheme, and provides a discrete data frequency estimation scheme meeting the loose local differential privacy. The main idea of the scheme is that a user terminal reduces noise injection on original data according to definition of loose local differential privacy, reduces distortion of the data as much as possible on the basis of meeting the local differential privacy, improves usability of disturbed data, and further improves accuracy of a statistical result, as shown in fig. 1, the method includes:
101, a user side generates discrete data codes according to the types of the discrete data sent to a data center;
102, the user side acquires the disturbing codes corresponding to the discrete data codes and sends the disturbing codes corresponding to the discrete data codes to the data center;
103, the data center receives the disturbing codes corresponding to the discrete data codes of each user side;
and 104, the data center determines the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user terminals.
Wherein the length of the discrete data code is equal to the total number of the discrete data types.
The discrete data is encoded as (v)1...vi...vn) Where n is the total number of discrete data types, viIs the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v isi1, otherwise, vi=0。
For example: each user terminal possesses one of the discrete data in the discrete data set S. Each user terminal firstly checks the own data diPerforming independent heatingCoding, i.e. obtaining a unit vector v of length miOnly self data diThe corresponding position is 1, and the rest of the positions are 0. Specifically, if diIs the jth data (j ≦ m) in the discrete data set, the unit vector viThe j-th bit in (1) and the rest are 0.
Specifically, in the embodiment provided by the present invention, step 101 and step 102 may be applied to the user side, where in step 102, acquiring the scrambling code corresponding to the discrete data code includes:
acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
and determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
Further, the obtaining of the conversion probability of the code value corresponding to each type of discrete data in the discrete data coding includes:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure BDA0002321239460000071
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure BDA0002321239460000072
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure BDA0002321239460000073
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure BDA0002321239460000074
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure BDA0002321239460000075
and converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 1.
Wherein, δ is generally a value greater than 0 and much smaller than 1, and when δ is 0, the privacy protection mechanism satisfies the local differential privacy under strict definition. In this application, a discrete data frequency estimation method satisfying loose local differential privacy is mainly discussed.
Further, the determining, based on the transition probabilities of the code values corresponding to various types of discrete data in the discrete data codes, a scrambling code corresponding to the discrete data codes includes:
in the {0,1} set
Figure BDA0002321239460000076
Extract 0 to
Figure BDA0002321239460000077
Probability of (1) is extracted, and if 0 is extracted, then
Figure BDA0002321239460000078
If 1 is drawn, then
Figure BDA0002321239460000079
Based on the technical solutions of step 101 and step 102, the present invention provides a ue for discrete data frequency estimation, as shown in fig. 2, the ue includes:
the generating module is used for generating discrete data codes according to the types of the discrete data sent to the data center;
the acquisition module is used for acquiring a scrambling code corresponding to the discrete data code;
and the sending module is used for sending the scrambling codes corresponding to the discrete data codes to the data center.
Preferably, the length of the discrete data codes is equal to the total number of discrete data types.
Further, the discrete data is encoded as (v)1...vi...vn) Where n is the total number of discrete data types, viIs the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v isi1, otherwise, vi=0。
Preferably, the obtaining module includes:
the acquisition unit is used for acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
and the determining unit is used for determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
Further, the obtaining unit is specifically configured to:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure BDA0002321239460000081
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure BDA0002321239460000082
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure BDA0002321239460000083
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure BDA0002321239460000084
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure BDA0002321239460000085
and converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 1.
Further, the determining unit is specifically configured to:
in the {0,1} set
Figure BDA0002321239460000086
Extract 0 to
Figure BDA0002321239460000087
Probability of (1) is extracted, and if 0 is extracted, then
Figure BDA0002321239460000088
If 1 is drawn, then
Figure BDA0002321239460000089
In the embodiment provided by the present invention, step 103 and step 104 may be applied to a data center, where step 104 includes:
counting the frequency of 0 corresponding to the disturbing code value of the ith type of discrete data in the disturbing codes corresponding to the discrete data codes of each user terminal
Figure BDA00023212394600000810
And a frequency with scrambling code value 1
Figure BDA00023212394600000811
Based on
Figure BDA00023212394600000812
And
Figure BDA00023212394600000813
establishing an i-th discrete data generation frequency equation set;
and solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data.
Further, the system of the generation frequency equation of the ith type of discrete data is as follows:
Figure BDA0002321239460000091
in the above formula, f0(i) For no occurrence frequency of i-th type discrete data, f1(i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
Based on the technical solutions of step 103 and step 104, the present invention provides a data center for discrete data frequency estimation, as shown in fig. 3, the data center includes:
the receiving module is used for receiving the scrambling codes corresponding to the discrete data codes of the user sides;
and the determining module is used for determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides.
Preferably, the determining module includes:
a statistic unit for counting the frequency of 0 for the scrambling code value corresponding to the i-th discrete data in the scrambling codes corresponding to the discrete data codes of each user terminal
Figure BDA0002321239460000092
And a frequency with scrambling code value 1
Figure BDA0002321239460000093
A building unit for building based on
Figure BDA0002321239460000094
And
Figure BDA0002321239460000095
set up the firstGenerating a frequency equation set of i-type discrete data;
and the solving unit is used for solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data.
Further, the system of the generation frequency equation of the ith type of discrete data is as follows:
Figure BDA0002321239460000096
in the above formula, f0(i) For no occurrence frequency of i-th type discrete data, f1(i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
Meanwhile, the present invention also provides a discrete data frequency estimation system, as shown in fig. 4, the system includes: the user side and the data center.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (20)

1. A method for estimating discrete data frequency, the method being applied to a user side, the method comprising:
generating discrete data codes according to the types of the discrete data sent to the data center;
and acquiring a scrambling code corresponding to the discrete data code, and sending the scrambling code corresponding to the discrete data code to a data center.
2. The method of claim 1, wherein the length of the discrete data encoding is equal to a total number of discrete data types.
3. The method of claim 2, wherein the discrete data is encoded as (v;)1...vi...vn) Where n is the total number of discrete data types, viIs the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v isi1, otherwise, vi=0。
4. The method of claim 1, wherein obtaining the scrambling code corresponding to the discrete data code comprises:
acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
and determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
5. The method of claim 4, wherein obtaining transition probabilities for code values corresponding to various types of discrete data in the encoding of discrete data comprises:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure FDA0002321239450000011
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure FDA0002321239450000012
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure FDA0002321239450000013
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure FDA0002321239450000014
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure FDA0002321239450000015
and converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 1.
6. The method of claim 5, wherein determining the scrambling code corresponding to the discrete data encoding based on transition probabilities of encoding values corresponding to various types of discrete data in the discrete data encoding comprises:
in the {0,1} set
Figure FDA0002321239450000021
Extract 0 to
Figure FDA0002321239450000022
Probability of (1) is extracted, and if 0 is extracted, then
Figure FDA0002321239450000023
If 1 is drawn, then
Figure FDA0002321239450000024
7. A user terminal for discrete data frequency estimation, the user terminal comprising:
the generating module is used for generating discrete data codes according to the types of the discrete data sent to the data center;
the acquisition module is used for acquiring a scrambling code corresponding to the discrete data code;
and the sending module is used for sending the scrambling codes corresponding to the discrete data codes to the data center.
8. The user terminal of claim 7, wherein the length of the discrete data codes is equal to the total number of discrete data types.
9. The user terminal of claim 8, wherein the discrete data is encoded as (v £ v1...vi...vn) Where n is the total number of discrete data types, viIs the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v isi1, otherwise, vi=0。
10. The user end according to claim 7, wherein the obtaining module includes:
the acquisition unit is used for acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
and the determining unit is used for determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
11. The user end according to claim 10, wherein the obtaining unit is specifically configured to:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure FDA0002321239450000025
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure FDA0002321239450000026
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure FDA0002321239450000027
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure FDA0002321239450000031
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure FDA0002321239450000032
and converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 1.
12. The user end according to claim 11, wherein the determining unit is specifically configured to:
in the {0,1} set
Figure FDA0002321239450000033
Extract 0 to
Figure FDA0002321239450000034
Probability of (1) is extracted, and if 0 is extracted, then
Figure FDA0002321239450000035
If 1 is drawn, then
Figure FDA0002321239450000036
13. A discrete data frequency estimation method applied to a data center is characterized by comprising the following steps:
receiving a scrambling code corresponding to the discrete data code of each user side;
and determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides.
14. The method of claim 13, wherein the determining the occurrence frequency of each type of discrete data according to the scrambling code corresponding to the discrete data code of each user terminal comprises:
counting the frequency of 0 corresponding to the disturbing code value of the ith type of discrete data in the disturbing codes corresponding to the discrete data codes of each user terminal
Figure FDA0002321239450000037
And a frequency with scrambling code value 1
Figure FDA0002321239450000038
Based on
Figure FDA0002321239450000039
And
Figure FDA00023212394500000310
establishing an i-th discrete data generation frequency equation set;
and solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data.
15. The method of claim 14, wherein the system of i-th class of discrete data generation frequency equations is:
Figure FDA00023212394500000311
in the above formula, f0(i) For no occurrence frequency of i-th type discrete data, f1(i) For the frequency of occurrence of i-th class of discrete data, e is the privacy preserving measureAnd if delta is a parameter under loose local differential privacy, the value is between 0 and 1.
16. A data center for use in discrete data frequency estimation, the data center comprising:
the receiving module is used for receiving the scrambling codes corresponding to the discrete data codes of the user sides;
and the determining module is used for determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides.
17. The data center of claim 16, wherein the determination module comprises:
a statistic unit for counting the frequency of 0 for the scrambling code value corresponding to the i-th discrete data in the scrambling codes corresponding to the discrete data codes of each user terminal
Figure FDA0002321239450000041
And a frequency with scrambling code value 1
Figure FDA0002321239450000042
A building unit for building based on
Figure FDA0002321239450000043
And
Figure FDA0002321239450000044
establishing an i-th discrete data generation frequency equation set;
and the solving unit is used for solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data.
18. The data center of claim 17, wherein the system of equations for the occurrence frequency of the ith type of discrete data is:
Figure FDA0002321239450000045
in the above formula, f0(i) For no occurrence frequency of i-th type discrete data, f1(i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
19. A method of discrete data frequency estimation, the method comprising:
the user side generates discrete data codes according to the types of the discrete data sent to the data center;
the method comprises the steps that a user side obtains a disturbing code corresponding to a discrete data code and sends the disturbing code corresponding to the discrete data code to a data center;
the data center receives the disturbing codes corresponding to the discrete data codes of the user sides;
and the data center determines the occurrence frequency of various discrete data according to the disturbing codes corresponding to the discrete data codes of the user terminals.
20. A discrete data frequency estimation system, the system comprising: the user terminal according to any of claims 7-12 and the data center according to any of claims 16-18.
CN201911298496.4A 2019-12-17 2019-12-17 Discrete data frequency estimation method, user side, data center and system Active CN112995076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911298496.4A CN112995076B (en) 2019-12-17 2019-12-17 Discrete data frequency estimation method, user side, data center and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911298496.4A CN112995076B (en) 2019-12-17 2019-12-17 Discrete data frequency estimation method, user side, data center and system

Publications (2)

Publication Number Publication Date
CN112995076A true CN112995076A (en) 2021-06-18
CN112995076B CN112995076B (en) 2022-09-27

Family

ID=76341887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911298496.4A Active CN112995076B (en) 2019-12-17 2019-12-17 Discrete data frequency estimation method, user side, data center and system

Country Status (1)

Country Link
CN (1) CN112995076B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107302521A (en) * 2017-05-23 2017-10-27 全球能源互联网研究院 The sending method and method of reseptance of a kind of privacy of user data
CN108509627A (en) * 2018-04-08 2018-09-07 腾讯科技(深圳)有限公司 data discretization model training method and device, data discrete method
CN109299436A (en) * 2018-09-17 2019-02-01 北京邮电大学 A kind of ordering of optimization preference method of data capture meeting local difference privacy
CN110022531A (en) * 2019-03-01 2019-07-16 华南理工大学 A kind of localization difference privacy municipal refuse data report and privacy calculation method
WO2019172837A1 (en) * 2018-03-05 2019-09-12 Agency For Science, Technology And Research Method and system for deriving statistical information from encrypted data
CN110569286A (en) * 2019-09-11 2019-12-13 哈尔滨工业大学(威海) activity time sequence track mining method based on local differential privacy

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107302521A (en) * 2017-05-23 2017-10-27 全球能源互联网研究院 The sending method and method of reseptance of a kind of privacy of user data
WO2019172837A1 (en) * 2018-03-05 2019-09-12 Agency For Science, Technology And Research Method and system for deriving statistical information from encrypted data
CN108509627A (en) * 2018-04-08 2018-09-07 腾讯科技(深圳)有限公司 data discretization model training method and device, data discrete method
CN109299436A (en) * 2018-09-17 2019-02-01 北京邮电大学 A kind of ordering of optimization preference method of data capture meeting local difference privacy
CN110022531A (en) * 2019-03-01 2019-07-16 华南理工大学 A kind of localization difference privacy municipal refuse data report and privacy calculation method
CN110569286A (en) * 2019-09-11 2019-12-13 哈尔滨工业大学(威海) activity time sequence track mining method based on local differential privacy

Also Published As

Publication number Publication date
CN112995076B (en) 2022-09-27

Similar Documents

Publication Publication Date Title
US9407435B2 (en) Cryptographic key generation based on multiple biometrics
CN112395643B (en) Data privacy protection method and system for neural network
CN111222158B (en) Block chain-based two-party security and privacy comparison method
Merhav et al. Optimal watermark embedding and detection strategies under limited detection resources
CN108648761B (en) Method for embedding blockchain account book in audio digital watermark
CN110852374A (en) Data detection method and device, electronic equipment and storage medium
CN114491610B (en) Intelligent shared financial platform and system based on Hash encryption algorithm and quantum key
CN111026359B (en) Method and device for judging numerical range of private data in multi-party combination manner
CN112437060B (en) Data transmission method and device, computer equipment and storage medium
CN110598464B (en) Data and model safety protection method of face recognition system
CN115296862A (en) Network data secure transmission method based on data coding
CN113472537B (en) Data encryption method, system and computer readable storage medium
CN117240604B (en) Cloud computing-based data safe storage and energy saving optimization method
CN112995076B (en) Discrete data frequency estimation method, user side, data center and system
CN117195274A (en) Format file anti-fake method and system
CN115292739B (en) Data management method of metal mold design system
CN113537516B (en) Training method, device, equipment and medium for distributed machine learning model
CN112288757B (en) Encryption domain image segmentation optimization method based on data packing technology
CN115292726A (en) Semantic communication method and device, electronic equipment and storage medium
CN114003939A (en) Multiple collinearity analysis method for longitudinal federal scene
CN113766273A (en) Method and device for processing video data
Tverdokhlib et al. Method of Selective Steganographic Data Hiding Based on Graphic Containers
Chandramouli Watermarking capacity in the presence of multiple watermarks and a partially known channel
Moulin Information-hiding games
CN117938355B (en) Block chain-based joint prediction method, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant