CN115828194A - Data privacy protection method and detection method of privacy enhanced semi-blind digital fingerprint - Google Patents

Data privacy protection method and detection method of privacy enhanced semi-blind digital fingerprint Download PDF

Info

Publication number
CN115828194A
CN115828194A CN202211459070.4A CN202211459070A CN115828194A CN 115828194 A CN115828194 A CN 115828194A CN 202211459070 A CN202211459070 A CN 202211459070A CN 115828194 A CN115828194 A CN 115828194A
Authority
CN
China
Prior art keywords
matrix
data
privacy
fingerprint
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211459070.4A
Other languages
Chinese (zh)
Inventor
胡韵
张春玉
罗靖
江英华
王菽裕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xizang Minzu University
Original Assignee
Xizang Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xizang Minzu University filed Critical Xizang Minzu University
Priority to CN202211459070.4A priority Critical patent/CN115828194A/en
Publication of CN115828194A publication Critical patent/CN115828194A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Storage Device Security (AREA)

Abstract

The invention discloses a data privacy protection method and a detection method of privacy enhanced semi-blind digital fingerprints, which combine a differential privacy technology and a digital fingerprint technology, embed well-designed noise into a specific position of a numerical aggregation data set, can realize the fingerprint embedding and noise interference operation of carrier data in one step, and simultaneously ensure the protection of private information in the carrier data set and the tracing function of traitors.

Description

Data privacy protection method and detection method of privacy enhanced semi-blind digital fingerprint
Technical Field
The invention belongs to the technical field of data privacy protection in information security, and particularly relates to a privacy enhanced type carrier data privacy protection method for semi-blind digital fingerprints.
Background
Access control and encryption algorithms are considered to be the main methods commonly used to prevent data from being accessed by unauthorized users during transmission or interaction, and how to prevent illegal distribution of private data by authorized users after receiving and decrypting the data is also a concern. More and more research has recognized the importance of traitor tracing, where digital fingerprinting techniques, which embed user-related fingerprint information into the original digital product for the purpose of tracing the source of data, play a crucial role in the field of traitor tracing. The fingerprint information of a user is embedded into carrier data such as images and videos, which is a main implementation mode of a digital fingerprint technology, however, in the research on a digital fingerprint scheme, the privacy protection of the carrier data is often neglected, and the phenomenon is obviously insufficient in dealing with the current complex and changeable network interaction environment.
In the existing digital fingerprint research schemes focusing on the privacy protection problem of carrier data, the privacy protection and traitor tracing are mostly implemented as two independent researches respectively. Even in the same system model, the two are rarely fully integrated together. The common scheme is to add a tracking module after completing the privacy protection function of data, such as using a data disinfection mechanism or a k-anonymization technology to protect the privacy of carrier data, and using a digital fingerprint technology to track data or related users.
Solving multiple corresponding problems in turn by superposition of multiple technologies, i.e. without innovations, also reduces the efficiency of the implementation of the solution. In response to the present high-speed and real-time data interaction environment, the way of executing different technologies in stages to achieve multiple targets is no longer applicable. In addition, although the conventional digital fingerprint technology has been further developed with the development and popularization of big data, the text-based digital fingerprint recognition scheme is still rare. The most fundamental reason is that text-type data contains too little redundant information to facilitate fingerprint embedding. Most text-based data fingerprinting schemes meet basic requirements by making different annotations to the values of attributes or tuples. They typically require the superposition of complex encryption algorithms, which greatly reduces usability and simplicity. In summary, it is a disadvantage of the prior art to design an algorithm based on digital fingerprinting technology that combines the privacy protection of text-type data carriers with the traitor tracing functionality.
Disclosure of Invention
The purpose of the invention is as follows: in order to solve the problems that the privacy protection and traitor tracing functions of a text type data carrier cannot be simultaneously considered by the existing digital fingerprint algorithm and the text type-based digital fingerprint identification scheme is lacked, the invention provides a carrier data privacy protection method based on traceable privacy enhanced semi-blind digital fingerprints and a method for detecting identification information of related users in illegally propagated data sets.
The technical scheme is as follows: a data privacy protection method based on traceable privacy enhanced semi-blind digital fingerprints comprises the following steps:
receiving a privacy protection request from a data requester, and acquiring a carrier data set to be subjected to privacy protection and user fingerprint information;
dividing the carrier data set D into proper k x k data blocks to obtain an aggregation data block set D k×k
Encoding user fingerprint information into a fingerprint coordinate matrix S * Then, the obtained fingerprint coordinate matrix S is obtained by adopting a permutation encryption algorithm * Carrying out replacement encryption to obtain a new fingerprint coordinate matrix S *
Calculating the differential privacy sensitivity of the carrier data set D, combining the differential privacy sensitivity with the differential privacy parameters epsilon and delta, and calculating to obtain the value range of the variance sigma of the differential privacy Gaussian mechanism; dividing the value range of the variance sigma of the difference privacy Gaussian mechanism into k equal parts, and sequentially putting the k equal parts into the matrix V according to the value size to obtain the matrix V 1×k Will matrix V 1×k Each scalar in the random noise block matrix P is sequentially substituted into a differential privacy Gaussian mechanism to generate a random noise block matrix P;
extracting new fingerprint coordinate matrix S by column * 2 values of each column are used to locate the aggregate data block set D k×k The random noise block is embedded in the corresponding position to obtain a noise-added data set
Figure BDA0003953800550000021
The noisy data set
Figure BDA0003953800550000022
The element in (1) is carrier data after privacy protection;
will add the data set of making a noise
Figure BDA0003953800550000023
And feeding back to the data requester.
Further, the carrier data set D is composed of text numeric type aggregation data.
Further, the user fingerprint information is encoded into a fingerprint coordinate matrix S * The method specifically comprises the following steps:
converting user fingerprint information S in the form of a character string into a decimal matrix (S) 10 Form, then decimal matrix (S) 10 Form conversion to binary matrix (S) 2 Forms thereof;
according to aggregate data block set D k×k Number of middle data blocks, binary matrix (S) 2 Formal conversion to matrix (S) with corresponding radix k k Forms;
general matrix (S) k Transforming into a fingerprint coordinate matrix S of 2 rows and m columns *
Further, the fingerprint coordinate matrix S obtained by adopting the permutation encryption algorithm * Performing replacement encryption, specifically comprising:
fingerprint coordinate matrix S obtained by permutation matrix R * Carrying out replacement encryption; the permutation matrix R is based on a custom permutation key k 1 The sequence of each letter in (a).
Further, the calculating the differential privacy sensitivity of the carrier data set D includes:
calculating the differential privacy sensitivity Delta of the carrier data set D according to the formula (1) 2 f:
Δ 2 f=max D,D′ ||f(D)-f(D′)|| 2 (1)
Where f is a query function, D represents the carrier data set, and D' represents the sibling of the carrier data set D; only one piece of data is different between D and D';
the method for calculating the value range of the variance sigma of the differential privacy Gaussian mechanism by combining the differential privacy sensitivity with the differential privacy parameters epsilon and delta comprises the following steps:
according to the formula (2), calculating to obtain the value range of the variance sigma of the difference privacy Gaussian mechanism:
Figure BDA0003953800550000031
in the formula,. DELTA. 2 f is the differential privacy sensitivity, and ε and δ are both differential privacy parameters.
The invention also discloses a method for detecting the identification information of the related users in the illegally spread data set, which comprises the following steps:
according to a data privacy protection method, carrying out privacy protection on a carrier data set to obtain a corresponding noisy data set; the data privacy protection method is a traceable privacy enhanced semi-blind digital fingerprint-based data privacy protection method;
for aggregate data block set D k×k Performing hash calculation on each data block to obtain a hash matrix H;
obtaining a data set D of a carrier to be detected * A data set D of the carrier to be detected * Dividing the data into k multiplied by k data blocks according to the same blocking mode as the carrier data set D to obtain an aggregation data block set { D * } k×k (ii) a Computing a set of aggregated data blocks { D * } k×k The hash value of each data block in the hash matrix H is obtained *
By comparing the hash matrix H with the hash matrix H * Recording two coordinate points with different hash matrix values to obtain a 2-row and m-column coordinate matrix S *
According to a coordinate matrix S * Locate and extract a set of aggregated data blocks { D } * } k×k Calculating the variance sigma of the corresponding data blocks through Gaussian fitting, and storing the variance sigma obtained through the Gaussian fitting calculation in a matrix M;
coordinate matrix S * Combining the matrix M and the matrix M into a matrix U according to rows, and rearranging the matrix U according to the sizes of elements in the matrix M and columns to obtain a new matrix U;
extracting the first 2 rows and the first k columns in the new matrix U as a noise fingerprint matrix (S) of a k system * ) k (ii) a Will noise fingerprint matrix (S) * ) k Converting into noise fingerprint data in a decimal number form;
and obtaining an identification information character string of the related user from the noise fingerprint data in the decimal number form through a permutation decryption algorithm, wherein the identification information character string is the user fingerprint information in the character string form.
Further, the two coordinate points with different hash matrix values are recorded to obtain a 2-row and m-column coordinate matrix S * The method comprises the following steps:
recording two coordinate points with different hash matrix values, and recording the abscissa to the coordinate matrix S according to columns * Line 1, the ordinate is recorded to the coordinate matrix S * To obtain a 2-row and m-column coordinate matrix S *
Further, said calculating the variance σ of the data blocks by gaussian fitting includes:
the variance σ of these data blocks is calculated according to:
Figure BDA0003953800550000032
in the formula, fitGauss represents a gaussian fitting function,
Figure BDA0003953800550000041
representing a set of aggregated data blocks { D * } k×k Row r and column c.
Has the advantages that: compared with the prior art, the invention has the following advantages:
(1) The invention combines the differential privacy technology and the digital fingerprint technology, embeds well-designed noise into the specific position of the numerical aggregation data set, can realize the fingerprint embedding and noise interference operation of carrier data in one step, and simultaneously ensures the protection of the private information in the carrier data set and the tracing function of traitors;
(2) In the aspect of realizing privacy protection of carrier data, the method can flexibly meet different privacy protection requirements, and particularly can realize higher-precision statistical availability of numerical carrier data;
(3) In the aspect of traitor tracing, the method can meet the basic imperceptibility, robustness and credibility of the digital fingerprint, and can resist collusion attack, and novel attacks such as buyer trapping, identity authentication, middleman and the like;
(4) The method is suitable for taking the text type data carrier as an object for embedding the fingerprint, and can realize the privacy protection and tracking functions of the carrier data on the basis of simplified calculation.
Drawings
FIG. 1 is a diagram of a system model of the present invention;
FIG. 2 is a diagram of a system security model of the present invention;
FIG. 3 is a schematic diagram of the digital fingerprint generation and noise embedding process of the present invention;
FIG. 4 is a flow chart of a simulation implementation of the digital fingerprint generation and noise embedding process of the present invention; (ii) a
FIG. 5 is a schematic diagram of a semi-blind fingerprint detection process of the present invention;
FIG. 6 is a flow chart of a simulation implementation of the semi-blind fingerprint detection process of the present invention;
FIG. 7 is a comparison graph of statistical probabilities of a total original data set and a total noisy data set of a simulation experiment of the present invention;
FIG. 8 is a comparison graph of statistical probabilities of labeled original data blocks and noisy data blocks of simulation experiment 8 set of the present invention;
FIG. 9 is a diagram of a comparison box of key values of 8 sets of data blocks of a simulation experiment according to the present invention after noise processing and corresponding original data blocks;
FIG. 10 is a relationship diagram of correct fingerprint identification rate CRR after different amounts of data are randomly deleted in a simulation experiment according to the present invention;
FIG. 11 is a comparison graph of key parameters for randomly deleting different amounts of data in a simulation experiment according to the present invention;
FIG. 12 is a CRR relationship diagram of the correct fingerprint identification rate after different amounts of data are randomly added in the simulation experiment of the present invention;
FIG. 13 is a comparison graph of key parameters of simulation experiments of the present invention with different amounts of data randomly added.
Detailed Description
The technical solution of the present invention will be further explained with reference to the accompanying drawings and embodiments.
The method of the present invention comprises two basic processes, respectively a digital fingerprint generation and noise embedding process and a semi-blind fingerprint detection process. Wherein the digital fingerprint generation and noise embedding process is used to implement a fingerprint encoding data generation for a specific user and a fingerprint noise embedding function for the original data set; a semi-blind fingerprint detection process is used to detect user-related identifying information in the captured illegally distributed data set.
Example 1:
the embodiment discloses a data privacy protection method of privacy enhanced semi-blind digital fingerprints, which comprises the following implementation steps:
step 1: receiving a privacy protection request from a data requester, and acquiring a carrier data set to be privacy protected and user fingerprint information; encoding fingerprint information of a user into a fingerprint coordinate matrix S * (ii) a The method specifically comprises the following steps:
converting user fingerprint information S in the form of a character string into a decimal matrix (S) 10 Form, and further convert it into binary matrix (S) 2 This process is denoted as S → (S) 10 →(S) 2 . Dividing the original data set D into proper k x k data blocks to obtain an aggregated data block set D k×k According to the aggregate data block set D k×k Number of middle data blocks, will (S) 2 Conversion to a matrix (S) with a corresponding radix k k Form, can tableShown as (S) 2 →(S) k . Finally, the matrix (S) k Into a matrix S of 2 rows and m columns * ,S * I.e. the required fingerprint coordinate matrix, this process is denoted as (S) k =(S k ) 1n →(S k ) 2m =S *
Step 2: in order to ensure stronger privacy, the fingerprint coordinate matrix S obtained in the step 1 is subjected to permutation matrix R * Executing the replacement encryption operation to obtain a new fingerprint coordinate matrix S * (ii) a The permutation matrix R is based on the permutation key k 1 The sequence of each letter in the set is calculated and can be expressed as
Figure BDA0003953800550000051
E.g. if κ 1 Is "MAKE", then R = [4,1,3,2]. Using the permutation matrix R, the fingerprint coordinate matrix S can be formed * By replacing with a new fingerprint coordinate matrix S *
And 3, step 3: calculating differential privacy sensitivity of the original dataset D based on privacy protection requirements for the dataset
Figure BDA0003953800550000052
Wherein f is a query function, and the calculation formula of the differential privacy sensitivity is as follows:
Δ 2 f=max D,D′ ||f(D)-f(D′)|| 2 (1)
and then, calculating the value range of the variance sigma of the difference privacy Gaussian mechanism by combining the known difference privacy parameters epsilon and delta, wherein the calculation formula is as follows:
Figure BDA0003953800550000053
then dividing the value range of the variance sigma into k equal parts, and sequentially putting the k equal parts into a matrix V according to the value size arrangement to obtain the matrix V 1×k And finally, the matrix V 1×k Into the differential privacy gaussian mechanism N (0, σ) 2 ) In generating random noise block momentsAnd (4) arraying P.
And 4, step 4: according to the fingerprint coordinate matrix S generated in the step 2 * Positioning a carrier original data set D to be embedded into a noise position, and sequentially embedding the random noise blocks generated in the step 3 into the corresponding positions; the specific operation comprises the following steps:
extracting the fingerprint coordinate matrix S generated in the step 2 according to columns * 2 values of each column are used to locate the aggregate data block set D k×k The coordinates of the random noise block need to be embedded. The random noise block is sequentially extracted from the random noise block matrix P according to columns and is sequentially superposed to the aggregation data block set D as additive noise k×k The corresponding Data block in (1) to finally obtain a Traceable noisy Data set (TND)
Figure BDA0003953800550000061
Will add the data set of making a noise
Figure BDA0003953800550000062
And feeding back to the data requester.
Example 2:
the embodiment discloses a method for detecting identification information of related users in an illegally-propagated data set, which comprises the following steps:
to enable semi-blind fingerprint detection, the set of aggregated data blocks D is computed before fingerprint noise embedding k×k The hash value of (1). Sequentially extracting aggregate data block set D k×k And calculates a 128-bit hash value for each data block using the MD5 algorithm, and stores the hash values in the hash matrix H by rows. The hash matrix H is issued to provide a third party arbitration mechanism with a reference for subsequent possible semi-blind detection.
Data set D of a carrier to be detected * Dividing the data into k multiplied by k data blocks according to the same blocking mode to obtain an aggregation data block set { D * } k×k Calculating { D ] in turn * } k×k And stores the hash values of each data block into a hash matrix H by rows * In (1).
Two Hash matrixes H and H are sequentially compared according to rows * Recording two coordinate points with different matrix values, and recording the abscissa to the coordinate matrix S by columns * Line 1, the ordinate is recorded to the coordinate matrix S * To finally obtain a 2-row and m-column coordinate matrix S *
According to the coordinate points obtained by calculation, sequentially positioning and extracting a polymerization data block set { D } * } k×k The variance σ of the corresponding data blocks is calculated by gaussian fitting, and can be specifically expressed as:
Figure BDA0003953800550000063
the variance σ calculated by the fitting is then stored in the matrix M in columns.
Coordinate matrix S * And combining the sum matrix M into a new matrix U according to rows, which can be specifically expressed as:
U=[S * M]′;
and rearranging the matrix U according to the size of the elements in the matrix M according to the columns to obtain a new matrix U.
Extracting step 9 to obtain a noise fingerprint matrix (S) taking the first 2 rows and the first k columns in the new matrix U as a k system * ) k Specifically, it can be expressed as:
S * =U 1:2,1:end ,S * =(s k ) 2m →(s k ) 1n =(S) k
then converted into a decimal number, which can be expressed as (S) k →(S) 2 ,(S) 2 →(S) 10 And finally, obtaining the identification information character string s of the related user through a replacement decryption algorithm.
Example 3:
as shown in fig. 1, the method of this embodiment is composed of two processes of digital fingerprint generation, noise embedding and semi-blind fingerprint detection, and fig. 1 is a system model diagram of the method, which is a maximum improvement methodEfficiency, it is crucial to place the process in a suitable environment. The method of the embodiment is suitable for a data interaction mode of multiple users, wherein black solid lines and round numerical marks in the figure indicate that the method publishes traceable noisy data sets
Figure BDA0003953800550000071
And a generation process corresponding to the hash matrix H, wherein the dotted line and the square digital mark represent a semi-blind fingerprint detection process. The whole method model comprises three types of entities which are respectively a data requester, a data service providing mechanism and a third party arbitration mechanism. Where data requesters need to obtain specially tailored data sets in order to perform queries, analyses or predictions. In this embodiment, the data requester is specifically required to provide that identification information such as User ID (User ID, UID), time, device physical address (MAC address) and the like be sent together with the data request. In addition, the data requestor may also be a malicious user that illegally propagates confidential information. Data service providers generally have two functions, one is to store and classify all collected data; another is to provide data computing services. In this embodiment, the computing service needs to include generating a fingerprint and embedding it into the original Data set, computing a hash value, and generating a Traceable noisy Data set (TND), etc. When disputes occur, the third party arbitration mechanism carries out identifier extraction calculation and arbitration operation on illegal data. The method holds the hash result of the original data set, compares the hash result of the data set to be detected and compares the difference between the hash result and the hash result, thereby accurately and safely calculating a corresponding coordinate fingerprint matrix and determining a malicious or related user.
The security model aims to give each subject in the method the most secure but absolutely available access to the object data. As shown in FIG. 2, the arrows in the figure represent response and restriction relationships between entities. The main body object of the security model of the method consists of a data requester, a data service providing mechanism and a third party arbitration mechanism. The object comprises an original data set, a corresponding noise fingerprint data set, an original hash matrix and a noise hash matrix. Only the data service provider can own and access the original data set, and the data requester obtains a noise data set which is subjected to privacy protection processing such as encryption, substitution, differential privacy and the like and is accompanied by identifiable information of the data requester. The third party arbitration mechanism can only access the hash matrix of the original data set and the data set to be detected, and because the third party arbitration mechanism cannot access the original data set, the vulnerability of an untrusted third party can be basically prevented.
The core of the method of the embodiment is to embed a set of carefully designed noise sets into a specific position of the original aggregated data set to realize privacy protection. Wherein the noise set is satisfied with differential privacy protection, and the identifying information of the data requester is converted into specific coordinates of the embedded noise in the original data set. Different data requesters have different identifying information, which may be a UID, a MAC address, or a combination thereof. The original data set is created by embedding a designed set of random noise at different coordinates to produce different copies. These coordinates are generated from a fingerprint encoded by the seed of the data requestor. These copies are unique throughout the data interaction process, referred to as TND in this embodiment.
For the convenience of simulation experiments and description, all data involved in the method, including the original aggregate data set and the noise data set, are uniformly represented in a matrix form. As shown in fig. 3, the original aggregate data set is divided into data block sets in the form of a square matrix, and the data set division form is illustrated in the figure and in the following flowchart details in the form of 8 × 8 (i.e., k = 8). Calculating the sensitivity of differential privacy of the data set according to the size of the data set and the privacy protection requirement strength
Figure BDA0003953800550000081
And gives the difference privacy computation parameters epsilon and delta. According to the formula
Figure BDA0003953800550000082
And calculating the variance sigma value range of the difference privacy Gaussian mechanism required by the method. The variance sigma is divided equally to obtain k sub-variance sigma with different sizes i 2 Based on differential privacy GaussMechanism N (0, σ) i 2 ) Generating random noise blocks P satisfying different Gaussian distributions i Wherein each noise block P i Should be equal to the amount of data in a certain block of the original data set. Since the original aggregated data set D is averaged into an 8 × 8 form, assuming that the data volume of the original data set is represented by r, the number of data partitions is r/64, and then the random noise set block P is generated i The amount of noise in (2) is also r/64. Noise block P i Inserting into the specific position of the original data block in turn to obtain the TND,
Figure BDA0003953800550000083
the coordinates embedded in the noise block are compiled by user identification information, the user identification information is firstly converted into a k-system one-dimensional matrix form (k = 8), and the one-dimensional matrix is converted into a 2-dimensional matrix form S with 2 rows and m columns through matrix conversion * Each column in the matrix is used to identify a specific coordinate point, the first row of data represents the abscissa of the coordinate point, and the second column of data represents the ordinate of the coordinate point. To cope with possible fingerprint detection, the divided raw data set D is calculated in blocks using MD5 algorithm k×k And stores the corresponding 128-bit hash value of each block of data in matrix H by row.
The method of the present embodiment can be implemented using a compiler-based programming language, and fig. 4 shows a simulation implementation flow of the fingerprint generation and noise embedding process. The function embed _ getembedded () is used to obtain the current time and user identifying information and put its binary form into a one-dimensional matrix. The function embed _ GetCoordinate () then converts this matrix into an octal coordinate matrix S of 2 rows and n columns * . The function getData () extracts 5 columns of target data from the original data set and puts it into a matrix named Originaldata, which is then divided evenly into a unit array OriginalCell in the form of a square of 8 × 8 blocks of data. Substituting OriginalCell, differential privacy parameters epsilon and delta into a function embed _ ComputeSigmaMat (), and calculating the value range of the parameter sigma of the differential privacy gaussian mechanism. The last function embed _ NoiseToOD () will be based onDifferent σ -generated gaussian noise sets are inserted in sequence into different positions of the OriginalCell, where these positions use the coordinate matrix generated by the function embed _ GetCoordinate (). Furthermore, the function embed _ gettotaltodhash () is used to compute the hash value of the original dataset and store it row by row in the matrix H for subsequent possible fingerprint detection.
The process of semi-blind fingerprint detection is shown in fig. 5. The method of the embodiment compares the hash result of the illegal data set to be detected with the hash result of the corresponding original data set to realize the semi-blind detection of the digital fingerprint. By comparing the two hash results, coordinate points with different hash values can be marked. The coordinate points at this time are a series of 0 to 63 (k × k =8 × 8= 64), the division of these numerical values by k (k = 8) is performed, the quotient and remainder thereof are the abscissa of the obtained mark point, and the results of these calculations are stored in columns in the matrix S * Wherein the abscissa is stored to the matrix S * The ordinate is stored to the second row. And the correct order of the marked coordinate points can be determined by calculating and rearranging the data blocks at specific locations. According to S * Sequentially extracting and fitting the coordinate points to calculate { D * } k×k The variance of the gaussian distribution of the corresponding data block,
Figure BDA0003953800550000091
these variance values are stored column by column in the matrix M. Will S * And M are combined into a new matrix U = [ S ] with 3 rows and M columns according to rows * M]', and rearranges U in columns according to the size of the last row value of the new matrix. Taking the contents of the first 2 rows and the first k columns in the matrix U, namely the noise fingerprint matrix (S) of the data set to be detected * ) k Finally in pair (S) * ) k And finally calculating the identification information s by matrix transformation and system conversion.
Fig. 6 shows a simulation implementation flow of the digital fingerprint semi-blind detection process of the method. The function detect _ GetNDHash (ND) calculates the data set { D to be detected * } k×k A hash value of each data block of the function, an input of the functionThe in parameter ND refers to the data set to be detected. And stores them in 64 x 32 character type H * And (4) matrix. The function detect _ compleHash () is used to compare the hash matrix H of the original data set with the matrix H generated by the function detect _ GetNDHash (ND) calculation * And recording the position coordinates with different contents in the matrix S * In (1). It is to be noted that the coordinate matrix S at this time * Is disordered. Function detect _ resetCoor () is used to fit out { D * } k×k The variance of the data block corresponding to the coordinate position according to S * Extracted by columns, and then the matrix S is rearranged according to the size of the different variance values * . At this time, S * The first k columns of (a) are the desired fingerprint coordinates. Finally, the function detect _ getSeed () is used for carrying out conversion and system conversion operations on the matrix, and S is finally calculated * The corresponding identifying information s.
Simulation experiments of the method are now implemented using records with 420,768 pieces of numerical data, each of which has 5 columns of attribute data. A time accurate to seconds, in the form of a character string "20221106212056", is selected as identifying information. Since time is dynamic and more difficult to control than user ID or MAC address. Other static seeds are easier to implement if the time can be accurately embedded and detected. Summarizing the simulation process, and assigning values for key data of the method, wherein k =8, epsilon belongs to [0.5,1 ]],δ∈[10 -5 ,1]. The correct identification information can be successfully embedded and extracted by a simulation experiment and an analysis method. Fig. 7, 8 and 9 respectively show the usability of TND from different angles. Wherein fig. 7 compares the overall data difference between the original data set D and the noise data set TND; figures 8 and 9 specifically compare the difference between 8 sets of data blocks corresponding to the fingerprint coordinates of the original data set and the noisy data set. Fig. 8 compares the difference between data blocks from the point of view of statistical probability, and fig. 9 compares the difference of key values between data blocks such as mean and variance.
Figure 7 shows the probability statistical distribution over the entire data set over different value ranges. For the experiment of this example, the whole data range was divided into 30 equal parts, and the percentage of data in each data range to the total was calculated. As can be seen from the histogram, the original data set and the noisy data set do not have a large difference in probability distribution, and the difference is positively correlated with the data amount, for example, the difference between the original data set and the noisy data set is significantly larger than the difference between 500 and 600 in the value range of 0 and 100.
Fig. 8 compares the difference in probability statistics of data blocks corresponding to fingerprint coordinates in the original and noisy data sets, which can be viewed as fig. 7 extraction and refinement. The fingerprint coordinates finally take 8 coordinate points in 64 data block partitions as positions for embedding the differential privacy Gaussian noise. Fig. 8 compares the probability statistical distribution of the original data block and the noise data block at these 8 positions. As can be seen from the comparison, the probability distributions of the two data sets are consistent, and the difference is negligible.
Figure 9 complements the differences between key values of the 8 data sets corresponding to the fingerprint coordinates in the form of a box and bar combined graph. The minimum (min), 25% value (Q1), median, 75% value (Q3), and maximum (max) values for the 8 data block pairs are shown in fig. 9. Wherein Q1, the median and Q3 form a compartmented box, and there is an extension line between Q3 to the maximum and Q1 to the minimum, respectively, indicating the degree of dispersion of the data. The comparison in block diagram form shows that the difference of key values between the original data block and the TND block is not large, and the difference of large values existing between the groups is a reasonable phenomenon.
Robustness describes the viability of a digital fingerprint after data processing operations. Based on the characteristics of high randomness and hash verification of the method of the embodiment, the influence of random deletion, insertion and modification of data on the detection result in a jumping mode is mainly researched. Experiments examine the relationship between the amount of data deleted or inserted and the accuracy of fingerprint detection.
Fig. 10 shows the Correct fingerprint Recognition Rate (CRR) after randomly deleting different amounts of data. Fig. 11 shows the relationship between the mean and the variance of CRR before and after deleting different amounts of data.
A delete data attack is one that deletes in an incremental fashionAnd (4) dividing the data. We randomly selected 2 from TND n Data, where n =1,2. The initial size of TND is r =420768 × 5. The gaussian mechanism should produce (r/64) × 8 random noise embedded at specific fingerprint coordinates of the original data set.
Because the noise generated by the gaussian mechanism is random, the TND obtained each time is also different. And 2 to be deleted in the experiment n The data is also randomly selected. In deletion 2 n After the data, a binary fingerprint (S) is calculated D ) 2 . Then (S) D ) 2 And (S) 2 And comparing bit by bit. Deletion 2 n CRR after data is (S) D ) 2 And (S) 2 The same number of bits as the ratio of the total number of bits. The results of FIG. 10 show that CRR is inversely proportional to the amount of data deleted and that when the deleted data reaches 2 15 At this time, CRR decreased significantly.
For universality, 2 for each deletion n The experiments for each data were performed 50 times. FIG. 11 compares and calculates deletion 2 n Mean and variance of CRR for 50 experiments performed on each data. To express CRR more clearly, we calculated the mean and variance of CRR for each set of 50 data in turn. As shown, there are 9 sets of bar graphs. Each set consists of three closely-connected bars representing the amount of data deleted, the mean and variance of CRR performed 50 times, respectively. It can be seen from the figure that the average is inversely proportional to the amount of data deleted, when the amount of data deleted reaches 2 15 When the average value becomes stable. And when the deletion amount reaches 2 15 The average value of CRR stabilized around 50%, which is consistent with fig. 10.
Similar to the deletion experiment, 2 n The data may also be randomly embedded in the TND. While fig. 12 and 13 show the correlation results for randomly embedded data. Can compare the data of 2 n Binary fingerprint (S) generated from a result set of random data 1 ) 2 With original fingerprint (S) 2 Consistency between them. As can be seen from fig. 12, the effect of the embedding attack experiment is slightly inferior to that of the deletion attack experiment. FIG. 13 compares and calculates insert 2 for each experiment n Averaging of 50 CRR executions of dataA value and a variance. Consistent with the results in FIG. 1, FIG. 13 illustrates when the amount of embedded data reaches 2 10 In this case, the average CRR value decreased and stabilized to about 50%.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A data privacy protection method of privacy enhanced semi-blind digital fingerprints is characterized in that: the method comprises the following steps:
receiving a privacy protection request from a data requester, and acquiring a carrier data set to be subjected to privacy protection and user fingerprint information;
dividing the carrier data set D into proper k x k data blocks to obtain an aggregation data block set D k×k
Encoding user fingerprint information into a fingerprint coordinate matrix S * Then, the obtained fingerprint coordinate matrix S is obtained by adopting a permutation encryption algorithm * Carrying out replacement encryption to obtain a new fingerprint coordinate matrix S *
Calculating the differential privacy sensitivity of the carrier data set D, combining the differential privacy sensitivity with the differential privacy parameters epsilon and delta, and calculating to obtain the value range of the variance sigma of the differential privacy Gaussian mechanism; dividing the value range of the variance sigma of the difference privacy Gaussian mechanism into k equal parts, and sequentially putting the k equal parts into the matrix V according to the value size to obtain the matrix V 1×k Will matrix V 1×k Each of which isSequentially substituting scalar quantities into a differential privacy Gaussian mechanism to generate a random noise block matrix P;
extracting new fingerprint coordinate matrix S by column * 2 values of each column are used to locate the aggregate data block set D k×k The random noise block is embedded in the corresponding position to obtain a noise-added data set
Figure FDA0003953800540000011
The noisy data set
Figure FDA0003953800540000012
The element in (1) is carrier data after privacy protection;
will add the data set of making a noise
Figure FDA0003953800540000013
And feeding back to the data requester.
2. The method for protecting data privacy of enhanced privacy semi-blind digital fingerprints according to claim 1, wherein: the carrier data set D is composed of text numerical type aggregation data.
3. The method for protecting data privacy of enhanced privacy semi-blind digital fingerprints according to claim 1, wherein: the fingerprint information of the user is coded into a fingerprint coordinate matrix S * The method specifically comprises the following steps:
converting user fingerprint information S in the form of a character string into a decimal matrix (S) 10 Form, then decimal matrix (S) 10 Form conversion to binary matrix (S) 2 Forms thereof;
according to aggregate data block set D k×k Number of middle data block, binary matrix (S) 2 Formal conversion to matrix (S) with corresponding radix k k Forms thereof;
general matrix (S) k Transforming into a fingerprint coordinate matrix S of 2 rows and m columns *
4. The method for protecting data privacy of enhanced privacy semi-blind digital fingerprints according to claim 1, wherein: the fingerprint coordinate matrix S obtained by adopting the permutation encryption algorithm * Performing replacement encryption, specifically comprising:
fingerprint coordinate matrix S obtained by permutation matrix R * Carrying out replacement encryption; the permutation matrix R is based on a custom permutation key k 1 The sequence of each letter in (a).
5. The method for protecting data privacy of enhanced privacy semi-blind digital fingerprints according to claim 1, wherein: the calculating of the differential privacy sensitivity of the carrier data set D comprises:
calculating the differential privacy sensitivity Delta of the carrier data set D according to the formula (1) 2 f:
Δ 2 f=max D,D′ ||f(D)-f(D′)|| 2 (1)
Where f is a query function, D represents the carrier data set, and D' represents a sibling data set of the carrier data set D; only one piece of data is different between D and D';
the method for calculating the value range of the variance sigma of the differential privacy Gaussian mechanism by combining the differential privacy sensitivity with the differential privacy parameters epsilon and delta comprises the following steps:
according to the formula (2), calculating to obtain the value range of the variance sigma of the difference privacy Gaussian mechanism:
Figure FDA0003953800540000021
in the formula,. DELTA. 2 f is the differential privacy sensitivity, and ε and δ are both differential privacy parameters.
6. A method of detecting identifying information of an associated user in an illegally distributed data set, the method comprising: the method comprises the following steps:
according to a data privacy protection method, carrying out privacy protection on a carrier data set to obtain a corresponding noisy data set; the data privacy protection method is the data privacy protection method of the privacy enhanced semi-blind digital fingerprint in any one of claims 1 to 5;
in the course of executing the data privacy protection method, aggregate data block set D is subjected to random noise block embedding k×k Performing hash calculation on each data block to obtain a hash matrix H;
obtaining a data set D of a carrier to be detected * A data set D of the carrier to be detected * Dividing the data into k multiplied by k data blocks according to the same blocking mode as the carrier data set D to obtain an aggregated data block set { D } * } k×k (ii) a Computing a set of aggregated data blocks { D * } k×k The hash value of each data block in the hash matrix H is obtained *
By comparing the hash matrix H with the hash matrix H * Recording two coordinate points with different hash matrix values to obtain a 2-row and m-column coordinate matrix S *
According to a coordinate matrix S * Locate and extract a set of aggregated data blocks { D } * } k×k Calculating the variance sigma of the corresponding data blocks through Gaussian fitting, and storing the variance sigma obtained through the Gaussian fitting calculation in a matrix M;
coordinate matrix S * Combining the matrix M and the matrix M into a matrix U according to rows, and rearranging the matrix U according to the sizes of elements in the matrix M and columns to obtain a new matrix U;
extracting the first 2 rows and the first k columns in the new matrix U as a noise fingerprint matrix (S) of a k system * ) k (ii) a Will noise fingerprint matrix (S) * ) k Converting into noise fingerprint data in decimal number form;
and obtaining an identification information character string of the related user from the noise fingerprint data in the decimal number form through a permutation decryption algorithm, wherein the identification information character string is the user fingerprint information in the character string form.
7. A method of detecting identifying information of associated users in an illegally distributed data set according to claim 6, characterized in that: recording two coordinate points with different hash matrix values to obtain a 2-row and m-column coordinate matrix S * The method comprises the following steps:
recording two coordinate points with different hash matrix values, and recording the abscissa to the coordinate matrix S according to columns * Line 1, the ordinate is recorded to the coordinate matrix S * To obtain a 2-row and m-column coordinate matrix S *
8. A method of detecting identifying information of associated users in an illegally distributed data set according to claim 6, characterised in that: the calculating of the variance σ of the data blocks by gaussian fitting includes:
the variance σ of these data blocks is calculated according to:
Figure FDA0003953800540000031
in the formula, fitGauss represents a gaussian fitting function,
Figure FDA0003953800540000032
representing a set of aggregated data blocks { D * } k×k Row r and column c.
CN202211459070.4A 2022-11-21 2022-11-21 Data privacy protection method and detection method of privacy enhanced semi-blind digital fingerprint Pending CN115828194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211459070.4A CN115828194A (en) 2022-11-21 2022-11-21 Data privacy protection method and detection method of privacy enhanced semi-blind digital fingerprint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211459070.4A CN115828194A (en) 2022-11-21 2022-11-21 Data privacy protection method and detection method of privacy enhanced semi-blind digital fingerprint

Publications (1)

Publication Number Publication Date
CN115828194A true CN115828194A (en) 2023-03-21

Family

ID=85529849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211459070.4A Pending CN115828194A (en) 2022-11-21 2022-11-21 Data privacy protection method and detection method of privacy enhanced semi-blind digital fingerprint

Country Status (1)

Country Link
CN (1) CN115828194A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116192388A (en) * 2023-04-26 2023-05-30 广东广宇科技发展有限公司 Mixed key encryption processing method based on digital fingerprint

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116192388A (en) * 2023-04-26 2023-05-30 广东广宇科技发展有限公司 Mixed key encryption processing method based on digital fingerprint

Similar Documents

Publication Publication Date Title
Panah et al. On the properties of non-media digital watermarking: a review of state of the art techniques
US9977918B2 (en) Method and system for verifiable searchable symmetric encryption
Halder et al. Watermarking techniques for relational databases: Survey, classification and comparison
Farfoura et al. A novel blind reversible method for watermarking relational databases
Guo et al. A fragile watermarking scheme for detecting malicious modifications of database relations
Li et al. Fingerprinting relational databases: Schemes and specialties
Chai et al. Preserving privacy while revealing thumbnail for content-based encrypted image retrieval in the cloud
Camara et al. Distortion-free watermarking approach for relational database integrity checking
Weng et al. Privacy-preserving outsourced media search
Hou et al. A graded reversible watermarking scheme for relational data
CN115828194A (en) Data privacy protection method and detection method of privacy enhanced semi-blind digital fingerprint
Liu et al. A block oriented fingerprinting scheme in relational database
Martínez et al. On watermarking for collaborative model-driven engineering
Iftikhar et al. A survey on reversible watermarking techniques for relational databases
Ji et al. The curse of correlations for robust fingerprinting of relational databases
Khanduja et al. A robust multiple watermarking technique for information recovery
Zhou et al. An additive-attack-proof watermarking mechanism for databases' copyrights protection using image
Shah et al. Semi-fragile watermarking scheme for relational database tamper detection
Hurrah et al. CADEN: cellular automata and DNA based secure framework for privacy preserving in IoT based healthcare
Khanduja et al. A generic watermarking model for object relational databases
Yilmaz et al. Collusion-resilient probabilistic fingerprinting scheme for correlated data
Tiwari et al. A novel watermarking scheme for secure relational databases
Yang et al. BDCP: a framework for big data copyright protection based on digital watermarking
Ji et al. Privacy-preserving database fingerprinting
Hu et al. Towards a privacy protection-capable noise fingerprinting for numerically aggregated data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination