CN113704709A - Digital watermark data tracing method based on attribute importance index - Google Patents
Digital watermark data tracing method based on attribute importance index Download PDFInfo
- Publication number
- CN113704709A CN113704709A CN202110996040.6A CN202110996040A CN113704709A CN 113704709 A CN113704709 A CN 113704709A CN 202110996040 A CN202110996040 A CN 202110996040A CN 113704709 A CN113704709 A CN 113704709A
- Authority
- CN
- China
- Prior art keywords
- data
- watermark
- attribute
- index
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000005192 partition Methods 0.000 claims description 5
- 238000000638 solvent extraction Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 239000002245 particle Substances 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/16—Program or content traceability, e.g. by watermarking
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Technology Law (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Editing Of Facsimile Originals (AREA)
- Image Processing (AREA)
Abstract
The digital watermark data tracing method based on the attribute importance index specifically comprises the following steps: 1, summarizing original data to be distributed, and extracting a prediction attribute and a class label attribute of each piece of original data to form a data table; 2, creating a watermark index table according to a data receiver of the original data and generating a KEY; 3, forming an unimportant attribute set attr; 4, embedding the watermark to obtain a data set containing the watermark; 5, distributing the data set containing the watermark according to the information of the data receiver in the watermark index table, and collecting suspected leakage data which are completely or partially leaked in the distribution process or after distribution to form a suspected leakage data set; 6, extracting all sub watermarks in each piece of data in the suspected leakage data set and connecting the sub watermarks with complete connecting bits; and 7, searching out a corresponding data receiver, namely an individual revealing data, through the extracted complete watermark by using a watermark index table, and finishing data disclosure tracing.
Description
Technical Field
The invention relates to the field of data tracing, in particular to a digital watermark data tracing method based on attribute importance indexes.
Background
With the rapid development of data transmission and sharing technologies, data is frequently sent out from a system, and the data contains sensitive information of a data owner, so how to prevent an authorized object from performing unauthorized forwarding after acquiring the data becomes a problem to be solved urgently in data security. For example, data owners such as governments and enterprise organizations have a large amount of data, and in order to extract valuable information and knowledge from the data, the data needs to be sent to a plurality of different third-party data analysis organizations for analysis and processing, and it may happen that an untrusted third party forwards received data to another person, so that illegal forwarding of the data is caused, data privacy is revealed, and how to determine a third party who reveals the data is a key for tracing data disclosure.
The digital watermarking technology is a common method for solving the problem of data copyright at present, and a series of watermarking algorithms are provided by continuous attention of researchers in recent years. Most of the existing research focuses on maintaining the availability of data, and can be basically divided into two types: optimization algorithm based methods and histogram technique based methods. In the research based on the Optimization Algorithm, the idea of converting watermark embedding into solving the optimal solution problem under the constraint condition is adopted, the watermark is created by using the Optimization algorithms such as Genetic Algorithm (Genetic Algorithm) and Particle Swarm Optimization (Particle Swarm Optimization), and the like, and the data availability is used as the constraint condition in the embedding process; in the histogram technology-based method, the gray level histogram adjustment method applied to the image watermark is used on a database, so that smaller data disturbance is realized. Some researches focus on ensuring the security of the watermark, and the watermark is segmented and then embedded into a plurality of groups so as to maintain a certain redundancy and maintain the usability of the watermark.
The existing method is mainly insufficient in the aspects of data availability and watermark safety, especially focuses on the research of watermark safety, and cannot combine the distribution characteristics of data in the watermark embedding process, so that the data availability is greatly damaged; meanwhile, the basic assumption is that data is complete in the distribution process, but in an actual situation, a data leakage person may only leak part of data tuples, so that the watermark embedded in the data is damaged, and the watermark extraction and the tracing of the leakage person are greatly influenced.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a digital watermark data tracing method based on an attribute importance index.
The invention adopts the following technical scheme:
the digital watermark data tracing method based on the attribute importance index comprises the following steps:
step 1, summarizing original data to be distributed, and extracting condition attribute A of each piece of original datai(i is more than or equal to 1 and less than or equal to n) and the class label attribute L form a data table D, wherein n represents the number of condition attributes of each piece of original data, the class label attribute L corresponds to s types of classifications, and the data table D comprises M pieces of original data;
step 2, creating a watermark index table according to the data receivers of the original data in the step 1, wherein the watermark index table comprises the information of each original data receiver and the original watermark W to be embedded in the original dataii(1 ≦ ii ≦ G), G representing the number of data recipients and generating the KEY KEY;
step 3, forming an unimportant attribute set attr;
step 4, according to the non-important attribute set attr and the watermark W embedded in each piece of original data in the watermark index table in step 2ii(1 ≦ ii ≦ M), and embedding the watermark into the corresponding original data to obtain a data set D containing the watermarkW;
Step 5, the D obtained in the step 4WDistributing according to the information of the data receiver in the watermark index table established in the step 2And collecting suspected leakage data which are completely or partially leaked in the distribution process or after distribution, and integrating the suspected leakage data into a suspected leakage data set DW’;
Step 6, regarding suspected leakage data set DW' extracting all sub-watermarks in each piece of data and connecting the sub-watermarks into a complete watermark;
and 7, searching out a corresponding data receiver, namely an individual revealing data, according to the complete watermark extracted in the step 6 through the watermark index table established in the step 2, and finishing data disclosure tracing.
In step 1, the class label attribute represents the class of the data, and comprises s classes;
conditional attributes refer to characteristics of the data based on which class label attributes of the data can be predicted using conventional prediction means.
In step 2, the original watermark contained in the original data it accepts is the same for the same data receiver.
The KEY is an arbitrary decimal number specified.
Step 3 comprises the following steps:
step 301, calculating the information gain ratio (A) of each condition attribute according to the data table established in step 1i,D);
Step 302, calculating Gini coefficient (A) of each condition attribute according to the data table of step 1i,D);
Step 303, for the information gain ratio (A) obtained in step 301iD) and Gini's coefficient determined in step 302 (A)iD) carrying out weighted average calculation to obtain each attribute AiImportance index impt _ index (A)iAnd D), sorting the attributes according to the size of the importance indexes, selecting tt attributes with the minimum importance indexes as the attributes of the watermarks to be embedded, and forming a non-important attribute set attr, wherein tt is more than or equal to 1 and less than or equal to n.
In step 301, the ratio of the original data in the jth classification to the data in the entire data table is set as pj(j ═ 1,2 …, s), s is the total class number of the data classes,conditional Attribute Ai(1. ltoreq. i. ltoreq.n) information gain ratio (A)iAnd D) satisfies the following relation:
wherein, Gain (A)iAnd D) is a conditional attribute AiInformation gain of (A), Split _ info (A)i) Is a pair AiThe following relationships are satisfied for the partition information of (1):
Gain(Ai,D)=Entropy(D)-Entropy(Ai,D)
wherein, Encopy (D) is the information Entropy of data table D, Encopy (A)iD) is a data sheet property A according to conditionsiThe divided conditional entropies respectively satisfy the following relations:
wherein r represents the data table D according to the condition attribute AiDivided into r subsets Dm(m=1,2,…,r),|Dm| represents the subset DmAnd | D | represents the original data amount of the data table.
In step 302, a dichotomy is used according to conditional Attribute Ai(1 ≦ i ≦ n) partitioning the dataset into subsets Zi1And Zi2(ii) a Firstly, all original data condition attributes A are addediThe attribute values of (a) are arranged in descending order, then the average value of the adjacent attribute values is calculated as a division point, and the data set is divided into: two subsets greater than and less than the division point;
the two data subsets contain original data Mi1And Mi2Conditional Attribute AiSatisfies the following relation:
where s is the total number of classes of data, Gini (Z)i1) And Gini (Z)i2) Respectively represent subsets Zi1And subset Zi2The coefficient of kini of (a).
In step 303, the tt ranges as: tt is more than or equal to 1 and less than or equal to n.
In step 303, the importance index impt _ index (A)iAnd D) satisfies the following relation:
importance index impt _ index (A)i,D)=a×GainRatio(Ai,D)+b×Gini(AiD), a, b are secret coefficients and satisfy 0<a、b<1,a+b=1。
Step 4 comprises the following steps:
step 401: embedding an initial watermark W in each piece of original dataii(1 ≦ ii ≦ M) split into t sub-watermarks Wiisub[index](0≤index≤t-1);
Step 402, traversing the non-important attribute set attr for each piece of original data in the data table D, taking the integer part integer and the decimal part decimal of each condition attribute value in attr, saving the length of the decimal part as decimal _ len, and calculating the sub-watermark W according to the position hash functioniisub[index]The embedding position in the fractional part decimal;
step 403, completing embedding the watermark into the original data condition attribute by using a watermark embedding algorithm;
step 404, repeating steps 402 and 403 until all the condition attributes of the original data in the data table D are embedded into the corresponding watermarks Wii(1≤ii≤M)。
In step 401, the segmentation method of the initial watermark includes:
Wiisub={Wii[b]Wii[b+1]…Wii[b+sublen-1]}
b=0×sub_len,1×sub_len,…,(t-1)×sub_len
In step 402, the embedding position satisfies the following relation:
position=H(KEYii||H(integer||index))%decimal_len
where H (KEY | | H (integer | | index)) represents a corresponding value calculated by KEY | | H (integer | | index) according to the position hash function, H (integer | | index) represents a corresponding value calculated by integer | | index according to the position hash function, and decimal _ len represents the length of the fractional part decimal.
In step 403, the watermark embedding algorithm is:
watermarkedDecimal= decimal[0:position]||Wiisub[index]||decimal[position+sub_len:end];
newValue=integer||watermarkedDecimal
wherein, watermark is embedded decimal part, newValue is new condition attribute value formed by connecting watermark decimal and integer, and digit [0: position ] represents 1 st bit to 1 st bit from left to right of decimal part digit; position + sub _ len: end represents the left-to-right position + sub _ len +1 bit to the last bit of the fractional part decimall; and | represents the concatenation of the character strings.
In step 6, traversing each piece of data in the data set to be traced, finding the non-important attribute of each piece of data by using the method in step 3, taking the integer part and the decimal part of the non-important attribute value, calculating the embedding position of the watermark, and extracting all the sub-watermarks to connect the sub-watermarks to form the complete watermark.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the traditional genetic algorithm and particle swarm algorithm, the method for constructing the watermark by combining the information gain rate of the data condition attribute and the Keyny coefficient is quicker on the premise of not losing the characteristics of the data.
2. The method not only considers the characteristics of single data, but also transversely considers the relative importance of each data in the data table where the data is located, so that the generated watermark has stronger security, uniqueness, secrecy and imperceptibility, the confidentiality and feasibility of tracing the data by using the method are greatly enhanced, and the usability of the data and the security of the watermark are effectively considered.
3. The invention divides the attribute value of the data condition attribute into an integer part and a decimal part and then embeds the integer part, can more effectively support a data owner to trace the source of the data in the scene of original data leakage, and prevents an attacker from damaging the watermark after leaking part of the original data to cause the problem of source tracing failure.
4. After the watermark data generated by the invention is distributed to a data receiver, if data classification prediction is needed in the later period, the classification accuracy of the data embedded with the watermark is far higher than that of the watermark data generated by the traditional algorithm.
Drawings
Fig. 1 is a flowchart of a digital watermark data tracing method based on attribute importance index according to the present invention.
Table 1 is a data table of an embodiment of the present invention;
table 2 is a data table after embedding a watermark according to an embodiment of the present invention;
table 3 is a data table revealed by the embodiment of the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
Fig. 1 is a flowchart of a digital watermark data tracing method based on attribute importance index, and the method specifically includes the following steps:
step 1, summarizing original data to be distributed, and extracting condition attribute A of each piece of original datai(i is more than or equal to 1 and less than or equal to n) and the class label attribute L form a data table D, wherein n represents the total number of condition attributes of each piece of original data, the class label attribute L corresponds to s types of classifications, and the data table D comprises M pieces of original data;
the class label attribute of the data represents the class of the data, and the class label attribute comprises s classes in total, namely s is the total class number of the data class. As shown in table 1, in the present embodiment, the class label attribute indicates that the type of data is language, and if there are 2 types of classifications of data in table 1, each of which is ES and FR, the number corresponding to s is 2.
Conditional attributes refer to characteristics of the data based on which class label attributes of the data can be predicted using conventional prediction means.
Step 2, creating a watermark index table according to the data receivers of the original data in the step 1, wherein the watermark index table comprises the information of each original data receiver and the original watermark W to be embedded in the original dataii(1 ≦ ii ≦ G), G representing the number of data recipients and generating the KEY KEY;
for the same data receiver, the initial watermark contained in the received original data is the same; different watermarks may be used for different data recipients, while the KEY is the same;
preferably, the key is a specified arbitrary decimal number;
step 3, forming an unimportant attribute set attr;
step 301, calculating the information gain ratio (A) of each condition attribute according to the data table established in step 1i,D);
Setting the proportion of the original data in the jth classification to the data in the whole data table as pj(j ═ 1,2 …, s), calculating a conditional attribute ai(1. ltoreq. i. ltoreq.n) information gain ratio (A)iD), which satisfies the following relation:
wherein, Gain (A)iAnd D) is a conditional attribute AiInformation gain of (A), Split _ info (A)i) Is a pair AiThe following relationships are satisfied for the partition information of (1):
Gain(Ai,D)=Entropy(D)-Entropy(Ai,D)
wherein, Encopy (D) is the information Entropy of data table D, Encopy (A)iD) is a data sheet property A according to conditionsiThe divided conditional entropies respectively satisfy the following relations:
wherein r represents the data table D according to the condition attribute AiDivided into r subsets Dm(m=1,2,…,r),|Dm| represents the subset DmThe quantity of original data in, | D | represents the quantity of original data in the data table
Step 302, calculating Gini coefficient (A) of each condition attribute according to the data table of step 1i,D);
Using dichotomy from conditional Attribute Ai(1 ≦ i ≦ n) partitioning the dataset into subsets Zi1And Zi2. Firstly, all original data condition attributes A are addediThe attribute values of the data are arranged from big to small, then the average values of the adjacent attribute values are calculated, each average value is used as a dividing point, and if qq average values exist, qq dividing conditions exist; each partitioning case partitions the data set into: greater than scratchA division point and two subsets of data smaller than the division point; then calculating the Gini coefficient in each division case, and finally selecting the minimum value of the Gini coefficients in all division cases as the final Gini (A) of the condition attributei,D)。
Taking table 1 as an example, the condition attribute X1 has 6 average values, i.e., 6 dividing points, with which to divide the data set, resulting in 6 dividing cases; then calculating the kini coefficient under each division condition; and finally, selecting the minimum kini coefficient in the 6 partitions as the kini coefficient of the condition attribute.
The two data subsets contain original data Mi1And Mi2Calculating the conditional attribute AiA coefficient of kini satisfying the following relationship:
step 303, for the information gain ratio (A) obtained in step 301iD) and Gini's coefficient determined in step 302 (A)iD) carrying out weighted average calculation to obtain each attribute AiImportance index impt _ index (A)iD), sorting the attributes according to the size of the importance indexes, selecting tt attributes with the minimum importance indexes as the attributes to be embedded with the watermark to form a non-important attribute set attr, wherein tt is more than or equal to 1 and less than or equal to n,
preferably, the range of tt is: tt is more than or equal to 1 and less than or equal to n;
importance index impt _ index (A)iD) has the following characteristics:
importance index impt _ index (A)i,D)=a×GainRatio(Ai,D)+b×Gini(AiD), a, b are secret coefficients and satisfy 0<a、b<1,a+b=1。
Step 4, according to the non-important attribute set attr and the watermark W embedded in each piece of original data in the watermark index table in step 2ii(1 ≦ ii ≦ M), and embedding the watermark into the corresponding original data to obtain a data set D containing the watermarkW;
Step 401: embedding an initial watermark W in each piece of original dataii(1 ≦ ii ≦ M) split into t sub-watermarks Wiisub[index](0≤index≤t-1);
The initial watermark segmentation method has the following characteristics:
Wiisub={Wii[b]Wii[b+1]…Wii[b+sublen-1]|b
=0×sub_len,1×sub_len,…,(t-1)×sub_len}
let sub _ len be 4 and t be 3, then b be 0, 4, 8,
Wiisub={Wii[0]Wii[0+1]Wii[0+2]Wii[0+3]Wii[4]Wii[4+1]Wii[4+ 2]Wii[4+3],Wii[8]Wii[8+1]Wii[8+2]Wii[8+3]}Wiisubis an initial watermark WiiWherein each sub-watermark is denoted as Wiisub[index]And index is more than or equal to 0 and less than or equal to t-1, and the sub-watermark length is
Step 402, traversing the non-important attribute set attr for each piece of original data in the data table D, taking the integer part integer and the decimal part decimal of each condition attribute value in attr, saving the length of the decimal part as decimal _ len, and calculating the sub-watermark W according to the position hash functioniisub[index]The embedding position in the fractional part decimal;
for the conditional attribute value-6.5128995678664, the integer part integer is-6 and the fractional part decimal is 5128995678664.
The embedding position satisfies the following relation:
position=H(KEYii||H(integer||index))%decimal_len
wherein H (KEY | | H (integer | | index)) represents a corresponding value calculated by KEY | | H (integer | | index) according to the position hash function, and H (integer | | index) represents a corresponding value calculated by integer | | index according to the position hash function; decimall _ len represents the length of the fractional part decimall;
step 403, completing embedding the watermark into the original data condition attribute by using a watermark embedding algorithm;
dividing fractional part decimal into two parts according to position bit, and dividing sub-watermark Wiisub[index]Inserting the watermark into the front part and the back part to form a decimal part watermark embedded into the watermark, and then connecting the decimal part watermark with the integer part integer to form a new condition attribute value newValue to finish embedding the watermark;
the watermark embedding algorithm is as follows:
watermarkedDecimal= decimal[0:position]||Wiisub[index]||decimal[position+sub_len:end];
newValue=integer||watermarkedDecimal
wherein, the decimal [0: position ] represents the 1 st bit to the position +1 st bit from left to right of the decimal part decimal; position + sub _ len: end represents the left-to-right position + sub _ len +1 bit to the last bit of the fractional part decimall; | represents the concatenation of the character strings;
step 404, repeating steps 402 and 403 until all the condition attributes of the original data in the data table D are embedded into the corresponding watermarks Wii(1≤ii≤M);
Step 5, the D obtained in the step 4WDistributing according to the information of the data receiver in the watermark index table established in the step 2, collecting suspected leakage data which are completely or partially leaked in the distribution process or after distribution, and integrating the suspected leakage data into a suspected leakage data set DW’
Step 6, regarding suspected leakage data set DW' extracting all sub-watermarks in each piece of data and connecting the sub-watermarks into a complete watermark;
traversing each piece of data in the data set to be traced, finding the non-important attribute of each piece of data by using the method in the step 3, taking the integer part and the fractional part of the non-important attribute value, and calculating the embedding position of the watermark; the method used is the same as in step 402, i.e. the watermark embedding location is calculated according to the following location hash function formula:
position=H(KEY||H(integer||index))%decimal_len
extracting the embedded sub-watermarks from the position bit to the position + sub _ len-1 bit of fractional part decimal, repeating the above process for each non-important attribute of each piece of data, extracting all the sub-watermarks, and finally connecting the sub-watermarks to form a complete watermark;
and 7, searching out a corresponding data receiver, namely an individual revealing data, according to the complete watermark extracted in the step 6 through the watermark index table established in the step 2, and thus finishing the revealing tracing.
The data table shown in table 1 has 5 condition attributes, 7 pieces of raw data, and the class label attribute language is the category of each piece of raw data. Suppose the owner of the data wants to add the watermark W12345678 and the KEY 13579.
The data owner specifies that the secret coefficient a is 0.5, t is 2, the information gain ratio of each condition attribute is calculated to GainRatio (X1, D) is 0.476, GainRatio (X2, D) is 0.53, GainRatio (X3, D) is 0.543, the kini coefficient of each condition attribute is calculated to Gini (X1, D) is 0.229, Gini (X2, D) is 0.214, Gini (X3, D) is 0.229, and the importance index impt _ index (X1, D) is 0.353, impt _ index (X2, D) is 0.372, impt _ index (X3, D) is 0.386) of each condition attribute is calculated from the information gain ratio and the kini coefficient. Because t is 2, two attributes X1 and X2 with the minimum importance index are selected as the attributes to be embedded with the watermark;
dividing the watermark W into two character watermarks respectively Wiisub[0]=1234,Wiisub[1]=5678;
The watermark is inserted into the decimal place of the X1 and X2 attribute values of the tuple, for example the tuple with ID 1, the sub-watermark Wiisub[0]Embedding position in attribute X1: position ═ H (13579| (H (7| | | 0)))% 9 ═ 5, the sub-watermark Wiisub[1]Embedding position in attribute X1: position H (13579| (H (-6| | 1)))% 9 ═ 8;
sub-watermark Wiisub[0]Insert the decimal 5 th bit of attribute X1 to form a new attribute value 7.0714712345633, and watermark Wiisub[1]Insert 8 th bit of X2 decimal place of attribute, form the new attribute value-6.5128995678664;
repeating until a data table with embedded watermarks is formed, as shown in table 2;
an attacker leaks three records in the data, and as shown in table 3, the data owner calculates the sub-watermark embedding position ═ H (13579| (H (7| | | 0))% > 9 ═ 5 of the attribute X1 of the tuple with ID ═ 1, extracts the sub-watermark Wiisub[0]1234, the sub-watermark embedding position H (13579| (H (-6| | 1)))% 9 ═ 8 of the attribute X2, and the sub-watermark W is extractediisub[1]5678, the sub-watermarks are spliced into a finished watermark W12345678, and tracing is finished.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.
TABLE 1
ID | language | X1 | X2 | X3 |
1 | ES | 7.071475633 | -6.512899664 | 7.650799805 |
2 | ES | 10.98296717 | -5.15744505 | 3.952060221 |
3 | ES | 7.827108364 | -5.477471938 | 7.816257284 |
4 | FR | 9.985760003 | -8.976570322 | 6.122981616 |
5 | FR | 13.88542526 | -6.233852322 | 2.229776427 |
6 | FR | 13.46616788 | -5.783487271 | 0.693888916 |
7 | ES | 12.28075786 | -2.437558361 | 3.175933842 |
TABLE 2
TABLE 3
Claims (14)
1. The digital watermark data tracing method based on the attribute importance index is characterized by comprising the following steps:
step 1, summarizing original data to be distributed, and extracting condition attribute A of each piece of original datai(i is more than or equal to 1 and less than or equal to n) and the class label attribute L form a data table D, wherein n represents the number of condition attributes of each piece of original data, the class label attribute L corresponds to s types of classifications, and the data table D comprises M pieces of original data;
step 2, creating a watermark index table according to the data receivers of the original data in the step 1, wherein the watermark index table comprises the information of each original data receiver and the original watermark W to be embedded in the original dataii(1 ≦ ii ≦ G), G representing the number of data recipients and generating the KEY KEY;
step 3, forming an unimportant attribute set attr;
step 4, according to the non-important attribute set attr and the watermark W embedded in each piece of original data in the watermark index table in step 2ii(1 ≦ ii ≦ M), and embedding the watermark into the corresponding original data to obtain a data set D containing the watermarkW;
Step 5, the D obtained in the step 4WDistributing according to the information of the data receiver in the watermark index table established in the step 2, collecting suspected leakage data which are completely or partially leaked in the distribution process or after distribution, and integrating the suspected leakage data into a suspected leakage data set DW’;
Step 6, regarding suspected leakage data set DW' extracting all sub-watermarks in each piece of data and connecting the sub-watermarks into a complete watermark;
and 7, searching out a corresponding data receiver, namely an individual revealing data, according to the complete watermark extracted in the step 6 through the watermark index table established in the step 2, and finishing data disclosure tracing.
2. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
in the step 1, the class label attribute represents the class of the data, and comprises s classes;
the condition attribute refers to the characteristic of the data, and the class label attribute of the data can be predicted by using a conventional prediction means based on the condition attribute.
3. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
in said step 2, the original watermark contained in the original data it accepts is the same for the same data receiver.
4. The digital watermark data tracing method based on attribute importance index according to claim 1 or 3, characterized in that:
the KEY is a specified arbitrary decimal number.
5. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
the step 3 comprises the following steps:
step 301, calculating the information gain ratio (A) of each condition attribute according to the data table established in step 1i,D);
Step 302, calculating Gini coefficient (A) of each condition attribute according to the data table of step 1i,D);
Step 303, for the information gain ratio (A) obtained in step 301iD) and Gini's coefficient determined in step 302 (A)iD) carrying out weighted average calculation to obtain each attribute AiImportance index impt _ index (A)iAnd D), sorting the attributes according to the size of the importance indexes, selecting tt attributes with the minimum importance indexes as the attributes of the watermarks to be embedded, and forming a non-important attribute set attr, wherein tt is more than or equal to 1 and less than or equal to n.
6. The method according to claim 5, wherein the method comprises:
in step 301, the proportion of the original data in the jth classification to the data in the whole data table is set as pj(j is 1,2 …, s), s is the total classification number of data classes, and the condition attribute Ai(1. ltoreq. i. ltoreq.n) information gain ratio (A)iAnd D) satisfies the following relation:
wherein, Gain (A)iAnd D) is a conditional attribute AiInformation gain of (A), Split _ info (A)i) Is a pair AiThe following relationships are satisfied for the partition information of (1):
Gain(Ai,D)=Entropy(D)-Entropy(Ai,D)
wherein, Encopy (D) is the information Entropy of data table D, Encopy (A)iD) is a data sheet property A according to conditionsiThe divided conditional entropies respectively satisfy the following relations:
wherein r represents the data table D according to the condition attribute AiDivided into r subsets Dm(m=1,2,…,r),|Dm| represents the subset DmAnd | D | represents the original data amount of the data table.
7. The method according to claim 5, wherein the method comprises:
in said step 302, a dichotomy is used according to the conditional attribute Ai(1 ≦ i ≦ n) partitioning the dataset into subsets Zi1And Zi2(ii) a Firstly, all original data condition attributes A are addediThe attribute values of (a) are arranged in descending order, then the average value of the adjacent attribute values is calculated as a division point, and the data set is divided into: two subsets greater than and less than the division point;
the two data subsets contain original data Mi1And Mi2Conditional Attribute AiSatisfies the following relation:
where s is the total number of classes of data, Gini (Z)i1) And Gini (Z)i2) Respectively represent subsets Zi1And subset Zi2The coefficient of kini of (a).
8. The method according to claim 5, wherein the method comprises:
in step 303, the tt range is: tt is more than or equal to 1 and less than or equal to n.
9. The method according to claim 5, wherein the method comprises:
in the step 303, the importance index impt _ index (A)iAnd D) satisfies the following relation:
importance index impt _ index (A)i,D)=a×GainRatio(Ai,D)+b×Gini(AiD), a, b are secret coefficients and satisfy 0<a、b<1,a+b=1。
10. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
the step 4 comprises the following steps:
step 401: embedding an initial watermark W in each piece of original dataii(1 ≦ ii ≦ M) split into t sub-watermarks Wiisub[index](0≤index≤t-1);
Step 402, traversing the non-important attribute set attr for each piece of original data in the data table D, taking the integer part integer and the decimal part decimal of each condition attribute value in attr, saving the length of the decimal part as decimal _ len, and calculating the sub-watermark W according to the position hash functioniisub[index]The embedding position in the fractional part decimal;
step 403, completing embedding the watermark into the original data condition attribute by using a watermark embedding algorithm;
step 404, repeating steps 402 and 403 until all the condition attributes of the original data in the data table D are embedded into the corresponding watermarks Wii(1≤ii≤M)。
11. The method according to claim 10, wherein the method for tracing the source of the digital watermark data based on the attribute importance index comprises:
in step 401, the segmentation method of the initial watermark includes:
Wiisub={Wii[b]Wii[b+1]…Wii[b+sublen-1]}
b=0×sub_len,1×sub_len,…,(t-1)×sub_len
12. The method according to claim 10, wherein the method for tracing the source of the digital watermark data based on the attribute importance index comprises:
in the step 402, the embedding position satisfies the following relation:
position=H(KEYii||H(integer||index))%decimal_len
where H (KEY | | H (integer | | index)) represents a corresponding value calculated by KEY | | H (integer | | index) according to the position hash function, H (integer | | index) represents a corresponding value calculated by integer | | index according to the position hash function, and decimal _ len represents the length of the fractional part decimal.
13. The method according to claim 10, wherein the method for tracing the source of the digital watermark data based on the attribute importance index comprises:
in step 403, the watermark embedding algorithm is:
watermarkedDecimal=
decimal[0:position]||Wiisub[index]||decimal[position+sub_len:end];
newValue=integer||watermarkedDecimal
wherein, watermark is embedded decimal part, newValue is new condition attribute value formed by connecting watermark decimal and integer, and digit [0: position ] represents 1 st bit to 1 st bit from left to right of decimal part digit; position + sub _ len: end represents the left-to-right position + sub _ len +1 bit to the last bit of the fractional part decimall; and | represents the concatenation of the character strings.
14. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
in the step 6, traversing each piece of data in the data set to be traced, finding the non-important attribute of each piece of data by using the method in the step 3, taking the integer part and the fractional part of the non-important attribute value, calculating the embedding position of the watermark, and extracting all the sub-watermarks to connect the sub-watermarks to form the complete watermark.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110996040.6A CN113704709A (en) | 2021-08-27 | 2021-08-27 | Digital watermark data tracing method based on attribute importance index |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110996040.6A CN113704709A (en) | 2021-08-27 | 2021-08-27 | Digital watermark data tracing method based on attribute importance index |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113704709A true CN113704709A (en) | 2021-11-26 |
Family
ID=78655989
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110996040.6A Pending CN113704709A (en) | 2021-08-27 | 2021-08-27 | Digital watermark data tracing method based on attribute importance index |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113704709A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241576A (en) * | 2020-01-03 | 2020-06-05 | 南京邮电大学 | Zero watermark method for distribution protection of database |
US20200336895A1 (en) * | 2019-04-22 | 2020-10-22 | Afero, Inc. | System and method for internet of things (iot) device validation |
CN112307741A (en) * | 2020-12-31 | 2021-02-02 | 北京邮电大学 | Insurance industry document intelligent analysis method and device |
CN112800394A (en) * | 2021-01-25 | 2021-05-14 | 南京邮电大学 | Security database watermark construction method based on clustering weighting multidimensional bucket grouping |
-
2021
- 2021-08-27 CN CN202110996040.6A patent/CN113704709A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200336895A1 (en) * | 2019-04-22 | 2020-10-22 | Afero, Inc. | System and method for internet of things (iot) device validation |
CN111241576A (en) * | 2020-01-03 | 2020-06-05 | 南京邮电大学 | Zero watermark method for distribution protection of database |
CN112307741A (en) * | 2020-12-31 | 2021-02-02 | 北京邮电大学 | Insurance industry document intelligent analysis method and device |
CN112800394A (en) * | 2021-01-25 | 2021-05-14 | 南京邮电大学 | Security database watermark construction method based on clustering weighting multidimensional bucket grouping |
Non-Patent Citations (1)
Title |
---|
韩银丹: ""基于混沌和散列函数的数据库水印算法研究"", 中国优秀硕士学位论文全文数据库, no. 02, 15 February 2015 (2015-02-15), pages 1 - 55 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Luo et al. | Coverless image steganography based on multi-object recognition | |
Li et al. | Tamper detection and localization for categorical data using fragile watermarks | |
CN107992727B (en) | Watermark processing and data tracing method based on original data deformation | |
US20060095775A1 (en) | Fragile watermarks | |
Imamoglu et al. | A new reversible database watermarking approach with firefly optimization algorithm | |
CN105512523B (en) | The digital watermark embedding and extracting method of a kind of anonymization | |
CN110770725B (en) | Data processing method and device | |
CN114356919A (en) | Watermark embedding method, tracing method and device for structured database | |
Shah et al. | Semi-fragile watermarking scheme for relational database tamper detection | |
US11983789B1 (en) | Generation method, detection method, generation device, and detection device of zero watermarking for trajectory data, and storage medium | |
Ji et al. | The curse of correlations for robust fingerprinting of relational databases | |
KR20010075944A (en) | Apparatus and method for inserting &extracting images watermark based on image segmentation in spatial domain | |
CN116757909B (en) | BIM data robust watermarking method, device and medium | |
CN113704709A (en) | Digital watermark data tracing method based on attribute importance index | |
Shen et al. | Relational database watermarking for data tracing | |
Lohegaon | A robust, distortion minimization fingerprinting technique for relational database | |
Shah et al. | Query preserving relational database watermarking | |
Wang | An efficient multiple-bit reversible data hiding scheme without shifting | |
Xiao et al. | Second-LSB-dependent robust watermarking for relational database | |
Rahmani et al. | High hiding capacity steganography method based on pixel indicator technique | |
Halboos et al. | Improve steganography system using agents software based on statistical and classification technique | |
CN104866737B (en) | A kind of DEM fragile watermark completeness certification methods for taking features of terrain into account | |
CN111177786A (en) | Database watermarking system based on random response mechanism local differential privacy | |
CN115134142B (en) | Information hiding method and system based on file segmentation | |
CN111091283A (en) | Power data fingerprint evaluation method based on Bayesian network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |