CN113704709A - Digital watermark data tracing method based on attribute importance index - Google Patents

Digital watermark data tracing method based on attribute importance index Download PDF

Info

Publication number
CN113704709A
CN113704709A CN202110996040.6A CN202110996040A CN113704709A CN 113704709 A CN113704709 A CN 113704709A CN 202110996040 A CN202110996040 A CN 202110996040A CN 113704709 A CN113704709 A CN 113704709A
Authority
CN
China
Prior art keywords
data
watermark
attribute
index
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110996040.6A
Other languages
Chinese (zh)
Inventor
徐超
邹云峰
单超
朱峰
范环宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co ltd Marketing Service Center, State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority to CN202110996040.6A priority Critical patent/CN113704709A/en
Publication of CN113704709A publication Critical patent/CN113704709A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The digital watermark data tracing method based on the attribute importance index specifically comprises the following steps: 1, summarizing original data to be distributed, and extracting a prediction attribute and a class label attribute of each piece of original data to form a data table; 2, creating a watermark index table according to a data receiver of the original data and generating a KEY; 3, forming an unimportant attribute set attr; 4, embedding the watermark to obtain a data set containing the watermark; 5, distributing the data set containing the watermark according to the information of the data receiver in the watermark index table, and collecting suspected leakage data which are completely or partially leaked in the distribution process or after distribution to form a suspected leakage data set; 6, extracting all sub watermarks in each piece of data in the suspected leakage data set and connecting the sub watermarks with complete connecting bits; and 7, searching out a corresponding data receiver, namely an individual revealing data, through the extracted complete watermark by using a watermark index table, and finishing data disclosure tracing.

Description

Digital watermark data tracing method based on attribute importance index
Technical Field
The invention relates to the field of data tracing, in particular to a digital watermark data tracing method based on attribute importance indexes.
Background
With the rapid development of data transmission and sharing technologies, data is frequently sent out from a system, and the data contains sensitive information of a data owner, so how to prevent an authorized object from performing unauthorized forwarding after acquiring the data becomes a problem to be solved urgently in data security. For example, data owners such as governments and enterprise organizations have a large amount of data, and in order to extract valuable information and knowledge from the data, the data needs to be sent to a plurality of different third-party data analysis organizations for analysis and processing, and it may happen that an untrusted third party forwards received data to another person, so that illegal forwarding of the data is caused, data privacy is revealed, and how to determine a third party who reveals the data is a key for tracing data disclosure.
The digital watermarking technology is a common method for solving the problem of data copyright at present, and a series of watermarking algorithms are provided by continuous attention of researchers in recent years. Most of the existing research focuses on maintaining the availability of data, and can be basically divided into two types: optimization algorithm based methods and histogram technique based methods. In the research based on the Optimization Algorithm, the idea of converting watermark embedding into solving the optimal solution problem under the constraint condition is adopted, the watermark is created by using the Optimization algorithms such as Genetic Algorithm (Genetic Algorithm) and Particle Swarm Optimization (Particle Swarm Optimization), and the like, and the data availability is used as the constraint condition in the embedding process; in the histogram technology-based method, the gray level histogram adjustment method applied to the image watermark is used on a database, so that smaller data disturbance is realized. Some researches focus on ensuring the security of the watermark, and the watermark is segmented and then embedded into a plurality of groups so as to maintain a certain redundancy and maintain the usability of the watermark.
The existing method is mainly insufficient in the aspects of data availability and watermark safety, especially focuses on the research of watermark safety, and cannot combine the distribution characteristics of data in the watermark embedding process, so that the data availability is greatly damaged; meanwhile, the basic assumption is that data is complete in the distribution process, but in an actual situation, a data leakage person may only leak part of data tuples, so that the watermark embedded in the data is damaged, and the watermark extraction and the tracing of the leakage person are greatly influenced.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a digital watermark data tracing method based on an attribute importance index.
The invention adopts the following technical scheme:
the digital watermark data tracing method based on the attribute importance index comprises the following steps:
step 1, summarizing original data to be distributed, and extracting condition attribute A of each piece of original datai(i is more than or equal to 1 and less than or equal to n) and the class label attribute L form a data table D, wherein n represents the number of condition attributes of each piece of original data, the class label attribute L corresponds to s types of classifications, and the data table D comprises M pieces of original data;
step 2, creating a watermark index table according to the data receivers of the original data in the step 1, wherein the watermark index table comprises the information of each original data receiver and the original watermark W to be embedded in the original dataii(1 ≦ ii ≦ G), G representing the number of data recipients and generating the KEY KEY;
step 3, forming an unimportant attribute set attr;
step 4, according to the non-important attribute set attr and the watermark W embedded in each piece of original data in the watermark index table in step 2ii(1 ≦ ii ≦ M), and embedding the watermark into the corresponding original data to obtain a data set D containing the watermarkW
Step 5, the D obtained in the step 4WDistributing according to the information of the data receiver in the watermark index table established in the step 2And collecting suspected leakage data which are completely or partially leaked in the distribution process or after distribution, and integrating the suspected leakage data into a suspected leakage data set DW’;
Step 6, regarding suspected leakage data set DW' extracting all sub-watermarks in each piece of data and connecting the sub-watermarks into a complete watermark;
and 7, searching out a corresponding data receiver, namely an individual revealing data, according to the complete watermark extracted in the step 6 through the watermark index table established in the step 2, and finishing data disclosure tracing.
In step 1, the class label attribute represents the class of the data, and comprises s classes;
conditional attributes refer to characteristics of the data based on which class label attributes of the data can be predicted using conventional prediction means.
In step 2, the original watermark contained in the original data it accepts is the same for the same data receiver.
The KEY is an arbitrary decimal number specified.
Step 3 comprises the following steps:
step 301, calculating the information gain ratio (A) of each condition attribute according to the data table established in step 1i,D);
Step 302, calculating Gini coefficient (A) of each condition attribute according to the data table of step 1i,D);
Step 303, for the information gain ratio (A) obtained in step 301iD) and Gini's coefficient determined in step 302 (A)iD) carrying out weighted average calculation to obtain each attribute AiImportance index impt _ index (A)iAnd D), sorting the attributes according to the size of the importance indexes, selecting tt attributes with the minimum importance indexes as the attributes of the watermarks to be embedded, and forming a non-important attribute set attr, wherein tt is more than or equal to 1 and less than or equal to n.
In step 301, the ratio of the original data in the jth classification to the data in the entire data table is set as pj(j ═ 1,2 …, s), s is the total class number of the data classes,conditional Attribute Ai(1. ltoreq. i. ltoreq.n) information gain ratio (A)iAnd D) satisfies the following relation:
Figure BDA0003233848230000031
wherein, Gain (A)iAnd D) is a conditional attribute AiInformation gain of (A), Split _ info (A)i) Is a pair AiThe following relationships are satisfied for the partition information of (1):
Gain(Ai,D)=Entropy(D)-Entropy(Ai,D)
Figure BDA0003233848230000032
wherein, Encopy (D) is the information Entropy of data table D, Encopy (A)iD) is a data sheet property A according to conditionsiThe divided conditional entropies respectively satisfy the following relations:
Figure BDA0003233848230000033
Figure BDA0003233848230000034
wherein r represents the data table D according to the condition attribute AiDivided into r subsets Dm(m=1,2,…,r),|Dm| represents the subset DmAnd | D | represents the original data amount of the data table.
In step 302, a dichotomy is used according to conditional Attribute Ai(1 ≦ i ≦ n) partitioning the dataset into subsets Zi1And Zi2(ii) a Firstly, all original data condition attributes A are addediThe attribute values of (a) are arranged in descending order, then the average value of the adjacent attribute values is calculated as a division point, and the data set is divided into: two subsets greater than and less than the division point;
the two data subsets contain original data Mi1And Mi2Conditional Attribute AiSatisfies the following relation:
Figure BDA0003233848230000041
Figure BDA0003233848230000042
where s is the total number of classes of data, Gini (Z)i1) And Gini (Z)i2) Respectively represent subsets Zi1And subset Zi2The coefficient of kini of (a).
In step 303, the tt ranges as: tt is more than or equal to 1 and less than or equal to n.
In step 303, the importance index impt _ index (A)iAnd D) satisfies the following relation:
importance index impt _ index (A)i,D)=a×GainRatio(Ai,D)+b×Gini(AiD), a, b are secret coefficients and satisfy 0<a、b<1,a+b=1。
Step 4 comprises the following steps:
step 401: embedding an initial watermark W in each piece of original dataii(1 ≦ ii ≦ M) split into t sub-watermarks Wiisub[index](0≤index≤t-1);
Step 402, traversing the non-important attribute set attr for each piece of original data in the data table D, taking the integer part integer and the decimal part decimal of each condition attribute value in attr, saving the length of the decimal part as decimal _ len, and calculating the sub-watermark W according to the position hash functioniisub[index]The embedding position in the fractional part decimal;
step 403, completing embedding the watermark into the original data condition attribute by using a watermark embedding algorithm;
step 404, repeating steps 402 and 403 until all the condition attributes of the original data in the data table D are embedded into the corresponding watermarks Wii(1≤ii≤M)。
In step 401, the segmentation method of the initial watermark includes:
Wiisub={Wii[b]Wii[b+1]…Wii[b+sublen-1]}
b=0×sub_len,1×sub_len,…,(t-1)×sub_len
wherein, WiisubIs an initial watermark WiiOf a sub-watermark set, the sub-watermark length being
Figure BDA0003233848230000043
In step 402, the embedding position satisfies the following relation:
position=H(KEYii||H(integer||index))%decimal_len
where H (KEY | | H (integer | | index)) represents a corresponding value calculated by KEY | | H (integer | | index) according to the position hash function, H (integer | | index) represents a corresponding value calculated by integer | | index according to the position hash function, and decimal _ len represents the length of the fractional part decimal.
In step 403, the watermark embedding algorithm is:
watermarkedDecimal= decimal[0:position]||Wiisub[index]||decimal[position+sub_len:end];
newValue=integer||watermarkedDecimal
wherein, watermark is embedded decimal part, newValue is new condition attribute value formed by connecting watermark decimal and integer, and digit [0: position ] represents 1 st bit to 1 st bit from left to right of decimal part digit; position + sub _ len: end represents the left-to-right position + sub _ len +1 bit to the last bit of the fractional part decimall; and | represents the concatenation of the character strings.
In step 6, traversing each piece of data in the data set to be traced, finding the non-important attribute of each piece of data by using the method in step 3, taking the integer part and the decimal part of the non-important attribute value, calculating the embedding position of the watermark, and extracting all the sub-watermarks to connect the sub-watermarks to form the complete watermark.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the traditional genetic algorithm and particle swarm algorithm, the method for constructing the watermark by combining the information gain rate of the data condition attribute and the Keyny coefficient is quicker on the premise of not losing the characteristics of the data.
2. The method not only considers the characteristics of single data, but also transversely considers the relative importance of each data in the data table where the data is located, so that the generated watermark has stronger security, uniqueness, secrecy and imperceptibility, the confidentiality and feasibility of tracing the data by using the method are greatly enhanced, and the usability of the data and the security of the watermark are effectively considered.
3. The invention divides the attribute value of the data condition attribute into an integer part and a decimal part and then embeds the integer part, can more effectively support a data owner to trace the source of the data in the scene of original data leakage, and prevents an attacker from damaging the watermark after leaking part of the original data to cause the problem of source tracing failure.
4. After the watermark data generated by the invention is distributed to a data receiver, if data classification prediction is needed in the later period, the classification accuracy of the data embedded with the watermark is far higher than that of the watermark data generated by the traditional algorithm.
Drawings
Fig. 1 is a flowchart of a digital watermark data tracing method based on attribute importance index according to the present invention.
Table 1 is a data table of an embodiment of the present invention;
table 2 is a data table after embedding a watermark according to an embodiment of the present invention;
table 3 is a data table revealed by the embodiment of the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
Fig. 1 is a flowchart of a digital watermark data tracing method based on attribute importance index, and the method specifically includes the following steps:
step 1, summarizing original data to be distributed, and extracting condition attribute A of each piece of original datai(i is more than or equal to 1 and less than or equal to n) and the class label attribute L form a data table D, wherein n represents the total number of condition attributes of each piece of original data, the class label attribute L corresponds to s types of classifications, and the data table D comprises M pieces of original data;
the class label attribute of the data represents the class of the data, and the class label attribute comprises s classes in total, namely s is the total class number of the data class. As shown in table 1, in the present embodiment, the class label attribute indicates that the type of data is language, and if there are 2 types of classifications of data in table 1, each of which is ES and FR, the number corresponding to s is 2.
Conditional attributes refer to characteristics of the data based on which class label attributes of the data can be predicted using conventional prediction means.
Step 2, creating a watermark index table according to the data receivers of the original data in the step 1, wherein the watermark index table comprises the information of each original data receiver and the original watermark W to be embedded in the original dataii(1 ≦ ii ≦ G), G representing the number of data recipients and generating the KEY KEY;
for the same data receiver, the initial watermark contained in the received original data is the same; different watermarks may be used for different data recipients, while the KEY is the same;
preferably, the key is a specified arbitrary decimal number;
step 3, forming an unimportant attribute set attr;
step 301, calculating the information gain ratio (A) of each condition attribute according to the data table established in step 1i,D);
Setting the proportion of the original data in the jth classification to the data in the whole data table as pj(j ═ 1,2 …, s), calculating a conditional attribute ai(1. ltoreq. i. ltoreq.n) information gain ratio (A)iD), which satisfies the following relation:
Figure BDA0003233848230000071
wherein, Gain (A)iAnd D) is a conditional attribute AiInformation gain of (A), Split _ info (A)i) Is a pair AiThe following relationships are satisfied for the partition information of (1):
Gain(Ai,D)=Entropy(D)-Entropy(Ai,D)
Figure BDA0003233848230000072
wherein, Encopy (D) is the information Entropy of data table D, Encopy (A)iD) is a data sheet property A according to conditionsiThe divided conditional entropies respectively satisfy the following relations:
Figure BDA0003233848230000073
Figure BDA0003233848230000074
wherein r represents the data table D according to the condition attribute AiDivided into r subsets Dm(m=1,2,…,r),|Dm| represents the subset DmThe quantity of original data in, | D | represents the quantity of original data in the data table
Step 302, calculating Gini coefficient (A) of each condition attribute according to the data table of step 1i,D);
Using dichotomy from conditional Attribute Ai(1 ≦ i ≦ n) partitioning the dataset into subsets Zi1And Zi2. Firstly, all original data condition attributes A are addediThe attribute values of the data are arranged from big to small, then the average values of the adjacent attribute values are calculated, each average value is used as a dividing point, and if qq average values exist, qq dividing conditions exist; each partitioning case partitions the data set into: greater than scratchA division point and two subsets of data smaller than the division point; then calculating the Gini coefficient in each division case, and finally selecting the minimum value of the Gini coefficients in all division cases as the final Gini (A) of the condition attributei,D)。
Taking table 1 as an example, the condition attribute X1 has 6 average values, i.e., 6 dividing points, with which to divide the data set, resulting in 6 dividing cases; then calculating the kini coefficient under each division condition; and finally, selecting the minimum kini coefficient in the 6 partitions as the kini coefficient of the condition attribute.
The two data subsets contain original data Mi1And Mi2Calculating the conditional attribute AiA coefficient of kini satisfying the following relationship:
Figure BDA0003233848230000081
Figure BDA0003233848230000082
step 303, for the information gain ratio (A) obtained in step 301iD) and Gini's coefficient determined in step 302 (A)iD) carrying out weighted average calculation to obtain each attribute AiImportance index impt _ index (A)iD), sorting the attributes according to the size of the importance indexes, selecting tt attributes with the minimum importance indexes as the attributes to be embedded with the watermark to form a non-important attribute set attr, wherein tt is more than or equal to 1 and less than or equal to n,
preferably, the range of tt is: tt is more than or equal to 1 and less than or equal to n;
importance index impt _ index (A)iD) has the following characteristics:
importance index impt _ index (A)i,D)=a×GainRatio(Ai,D)+b×Gini(AiD), a, b are secret coefficients and satisfy 0<a、b<1,a+b=1。
Step 4, according to the non-important attribute set attr and the watermark W embedded in each piece of original data in the watermark index table in step 2ii(1 ≦ ii ≦ M), and embedding the watermark into the corresponding original data to obtain a data set D containing the watermarkW
Step 401: embedding an initial watermark W in each piece of original dataii(1 ≦ ii ≦ M) split into t sub-watermarks Wiisub[index](0≤index≤t-1);
The initial watermark segmentation method has the following characteristics:
Wiisub={Wii[b]Wii[b+1]…Wii[b+sublen-1]|b
=0×sub_len,1×sub_len,…,(t-1)×sub_len}
let sub _ len be 4 and t be 3, then b be 0, 4, 8,
Wiisub={Wii[0]Wii[0+1]Wii[0+2]Wii[0+3]Wii[4]Wii[4+1]Wii[4+ 2]Wii[4+3],Wii[8]Wii[8+1]Wii[8+2]Wii[8+3]}Wiisubis an initial watermark WiiWherein each sub-watermark is denoted as Wiisub[index]And index is more than or equal to 0 and less than or equal to t-1, and the sub-watermark length is
Figure BDA0003233848230000083
Figure BDA0003233848230000084
Step 402, traversing the non-important attribute set attr for each piece of original data in the data table D, taking the integer part integer and the decimal part decimal of each condition attribute value in attr, saving the length of the decimal part as decimal _ len, and calculating the sub-watermark W according to the position hash functioniisub[index]The embedding position in the fractional part decimal;
for the conditional attribute value-6.5128995678664, the integer part integer is-6 and the fractional part decimal is 5128995678664.
The embedding position satisfies the following relation:
position=H(KEYii||H(integer||index))%decimal_len
wherein H (KEY | | H (integer | | index)) represents a corresponding value calculated by KEY | | H (integer | | index) according to the position hash function, and H (integer | | index) represents a corresponding value calculated by integer | | index according to the position hash function; decimall _ len represents the length of the fractional part decimall;
step 403, completing embedding the watermark into the original data condition attribute by using a watermark embedding algorithm;
dividing fractional part decimal into two parts according to position bit, and dividing sub-watermark Wiisub[index]Inserting the watermark into the front part and the back part to form a decimal part watermark embedded into the watermark, and then connecting the decimal part watermark with the integer part integer to form a new condition attribute value newValue to finish embedding the watermark;
the watermark embedding algorithm is as follows:
watermarkedDecimal= decimal[0:position]||Wiisub[index]||decimal[position+sub_len:end];
newValue=integer||watermarkedDecimal
wherein, the decimal [0: position ] represents the 1 st bit to the position +1 st bit from left to right of the decimal part decimal; position + sub _ len: end represents the left-to-right position + sub _ len +1 bit to the last bit of the fractional part decimall; | represents the concatenation of the character strings;
step 404, repeating steps 402 and 403 until all the condition attributes of the original data in the data table D are embedded into the corresponding watermarks Wii(1≤ii≤M);
Step 5, the D obtained in the step 4WDistributing according to the information of the data receiver in the watermark index table established in the step 2, collecting suspected leakage data which are completely or partially leaked in the distribution process or after distribution, and integrating the suspected leakage data into a suspected leakage data set DW
Step 6, regarding suspected leakage data set DW' extracting all sub-watermarks in each piece of data and connecting the sub-watermarks into a complete watermark;
traversing each piece of data in the data set to be traced, finding the non-important attribute of each piece of data by using the method in the step 3, taking the integer part and the fractional part of the non-important attribute value, and calculating the embedding position of the watermark; the method used is the same as in step 402, i.e. the watermark embedding location is calculated according to the following location hash function formula:
position=H(KEY||H(integer||index))%decimal_len
extracting the embedded sub-watermarks from the position bit to the position + sub _ len-1 bit of fractional part decimal, repeating the above process for each non-important attribute of each piece of data, extracting all the sub-watermarks, and finally connecting the sub-watermarks to form a complete watermark;
and 7, searching out a corresponding data receiver, namely an individual revealing data, according to the complete watermark extracted in the step 6 through the watermark index table established in the step 2, and thus finishing the revealing tracing.
The data table shown in table 1 has 5 condition attributes, 7 pieces of raw data, and the class label attribute language is the category of each piece of raw data. Suppose the owner of the data wants to add the watermark W12345678 and the KEY 13579.
The data owner specifies that the secret coefficient a is 0.5, t is 2, the information gain ratio of each condition attribute is calculated to GainRatio (X1, D) is 0.476, GainRatio (X2, D) is 0.53, GainRatio (X3, D) is 0.543, the kini coefficient of each condition attribute is calculated to Gini (X1, D) is 0.229, Gini (X2, D) is 0.214, Gini (X3, D) is 0.229, and the importance index impt _ index (X1, D) is 0.353, impt _ index (X2, D) is 0.372, impt _ index (X3, D) is 0.386) of each condition attribute is calculated from the information gain ratio and the kini coefficient. Because t is 2, two attributes X1 and X2 with the minimum importance index are selected as the attributes to be embedded with the watermark;
dividing the watermark W into two character watermarks respectively Wiisub[0]=1234,Wiisub[1]=5678;
The watermark is inserted into the decimal place of the X1 and X2 attribute values of the tuple, for example the tuple with ID 1, the sub-watermark Wiisub[0]Embedding position in attribute X1: position ═ H (13579| (H (7| | | 0)))% 9 ═ 5, the sub-watermark Wiisub[1]Embedding position in attribute X1: position H (13579| (H (-6| | 1)))% 9 ═ 8;
sub-watermark Wiisub[0]Insert the decimal 5 th bit of attribute X1 to form a new attribute value 7.0714712345633, and watermark Wiisub[1]Insert 8 th bit of X2 decimal place of attribute, form the new attribute value-6.5128995678664;
repeating until a data table with embedded watermarks is formed, as shown in table 2;
an attacker leaks three records in the data, and as shown in table 3, the data owner calculates the sub-watermark embedding position ═ H (13579| (H (7| | | 0))% > 9 ═ 5 of the attribute X1 of the tuple with ID ═ 1, extracts the sub-watermark Wiisub[0]1234, the sub-watermark embedding position H (13579| (H (-6| | 1)))% 9 ═ 8 of the attribute X2, and the sub-watermark W is extractediisub[1]5678, the sub-watermarks are spliced into a finished watermark W12345678, and tracing is finished.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.
TABLE 1
ID language X1 X2 X3
1 ES 7.071475633 -6.512899664 7.650799805
2 ES 10.98296717 -5.15744505 3.952060221
3 ES 7.827108364 -5.477471938 7.816257284
4 FR 9.985760003 -8.976570322 6.122981616
5 FR 13.88542526 -6.233852322 2.229776427
6 FR 13.46616788 -5.783487271 0.693888916
7 ES 12.28075786 -2.437558361 3.175933842
TABLE 2
Figure BDA0003233848230000111
Figure BDA0003233848230000121
TABLE 3
Figure BDA0003233848230000122

Claims (14)

1. The digital watermark data tracing method based on the attribute importance index is characterized by comprising the following steps:
step 1, summarizing original data to be distributed, and extracting condition attribute A of each piece of original datai(i is more than or equal to 1 and less than or equal to n) and the class label attribute L form a data table D, wherein n represents the number of condition attributes of each piece of original data, the class label attribute L corresponds to s types of classifications, and the data table D comprises M pieces of original data;
step 2, creating a watermark index table according to the data receivers of the original data in the step 1, wherein the watermark index table comprises the information of each original data receiver and the original watermark W to be embedded in the original dataii(1 ≦ ii ≦ G), G representing the number of data recipients and generating the KEY KEY;
step 3, forming an unimportant attribute set attr;
step 4, according to the non-important attribute set attr and the watermark W embedded in each piece of original data in the watermark index table in step 2ii(1 ≦ ii ≦ M), and embedding the watermark into the corresponding original data to obtain a data set D containing the watermarkW
Step 5, the D obtained in the step 4WDistributing according to the information of the data receiver in the watermark index table established in the step 2, collecting suspected leakage data which are completely or partially leaked in the distribution process or after distribution, and integrating the suspected leakage data into a suspected leakage data set DW’;
Step 6, regarding suspected leakage data set DW' extracting all sub-watermarks in each piece of data and connecting the sub-watermarks into a complete watermark;
and 7, searching out a corresponding data receiver, namely an individual revealing data, according to the complete watermark extracted in the step 6 through the watermark index table established in the step 2, and finishing data disclosure tracing.
2. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
in the step 1, the class label attribute represents the class of the data, and comprises s classes;
the condition attribute refers to the characteristic of the data, and the class label attribute of the data can be predicted by using a conventional prediction means based on the condition attribute.
3. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
in said step 2, the original watermark contained in the original data it accepts is the same for the same data receiver.
4. The digital watermark data tracing method based on attribute importance index according to claim 1 or 3, characterized in that:
the KEY is a specified arbitrary decimal number.
5. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
the step 3 comprises the following steps:
step 301, calculating the information gain ratio (A) of each condition attribute according to the data table established in step 1i,D);
Step 302, calculating Gini coefficient (A) of each condition attribute according to the data table of step 1i,D);
Step 303, for the information gain ratio (A) obtained in step 301iD) and Gini's coefficient determined in step 302 (A)iD) carrying out weighted average calculation to obtain each attribute AiImportance index impt _ index (A)iAnd D), sorting the attributes according to the size of the importance indexes, selecting tt attributes with the minimum importance indexes as the attributes of the watermarks to be embedded, and forming a non-important attribute set attr, wherein tt is more than or equal to 1 and less than or equal to n.
6. The method according to claim 5, wherein the method comprises:
in step 301, the proportion of the original data in the jth classification to the data in the whole data table is set as pj(j is 1,2 …, s), s is the total classification number of data classes, and the condition attribute Ai(1. ltoreq. i. ltoreq.n) information gain ratio (A)iAnd D) satisfies the following relation:
Figure FDA0003233848220000021
wherein, Gain (A)iAnd D) is a conditional attribute AiInformation gain of (A), Split _ info (A)i) Is a pair AiThe following relationships are satisfied for the partition information of (1):
Gain(Ai,D)=Entropy(D)-Entropy(Ai,D)
Figure FDA0003233848220000022
wherein, Encopy (D) is the information Entropy of data table D, Encopy (A)iD) is a data sheet property A according to conditionsiThe divided conditional entropies respectively satisfy the following relations:
Figure FDA0003233848220000023
Figure FDA0003233848220000031
wherein r represents the data table D according to the condition attribute AiDivided into r subsets Dm(m=1,2,…,r),|Dm| represents the subset DmAnd | D | represents the original data amount of the data table.
7. The method according to claim 5, wherein the method comprises:
in said step 302, a dichotomy is used according to the conditional attribute Ai(1 ≦ i ≦ n) partitioning the dataset into subsets Zi1And Zi2(ii) a Firstly, all original data condition attributes A are addediThe attribute values of (a) are arranged in descending order, then the average value of the adjacent attribute values is calculated as a division point, and the data set is divided into: two subsets greater than and less than the division point;
the two data subsets contain original data Mi1And Mi2Conditional Attribute AiSatisfies the following relation:
Figure FDA0003233848220000032
Figure FDA0003233848220000033
where s is the total number of classes of data, Gini (Z)i1) And Gini (Z)i2) Respectively represent subsets Zi1And subset Zi2The coefficient of kini of (a).
8. The method according to claim 5, wherein the method comprises:
in step 303, the tt range is: tt is more than or equal to 1 and less than or equal to n.
9. The method according to claim 5, wherein the method comprises:
in the step 303, the importance index impt _ index (A)iAnd D) satisfies the following relation:
importance index impt _ index (A)i,D)=a×GainRatio(Ai,D)+b×Gini(AiD), a, b are secret coefficients and satisfy 0<a、b<1,a+b=1。
10. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
the step 4 comprises the following steps:
step 401: embedding an initial watermark W in each piece of original dataii(1 ≦ ii ≦ M) split into t sub-watermarks Wiisub[index](0≤index≤t-1);
Step 402, traversing the non-important attribute set attr for each piece of original data in the data table D, taking the integer part integer and the decimal part decimal of each condition attribute value in attr, saving the length of the decimal part as decimal _ len, and calculating the sub-watermark W according to the position hash functioniisub[index]The embedding position in the fractional part decimal;
step 403, completing embedding the watermark into the original data condition attribute by using a watermark embedding algorithm;
step 404, repeating steps 402 and 403 until all the condition attributes of the original data in the data table D are embedded into the corresponding watermarks Wii(1≤ii≤M)。
11. The method according to claim 10, wherein the method for tracing the source of the digital watermark data based on the attribute importance index comprises:
in step 401, the segmentation method of the initial watermark includes:
Wiisub={Wii[b]Wii[b+1]…Wii[b+sublen-1]}
b=0×sub_len,1×sub_len,…,(t-1)×sub_len
wherein, WiisubIs an initial watermark WiiOf a sub-watermark set, the sub-watermark length being
Figure FDA0003233848220000041
12. The method according to claim 10, wherein the method for tracing the source of the digital watermark data based on the attribute importance index comprises:
in the step 402, the embedding position satisfies the following relation:
position=H(KEYii||H(integer||index))%decimal_len
where H (KEY | | H (integer | | index)) represents a corresponding value calculated by KEY | | H (integer | | index) according to the position hash function, H (integer | | index) represents a corresponding value calculated by integer | | index according to the position hash function, and decimal _ len represents the length of the fractional part decimal.
13. The method according to claim 10, wherein the method for tracing the source of the digital watermark data based on the attribute importance index comprises:
in step 403, the watermark embedding algorithm is:
watermarkedDecimal=
decimal[0:position]||Wiisub[index]||decimal[position+sub_len:end];
newValue=integer||watermarkedDecimal
wherein, watermark is embedded decimal part, newValue is new condition attribute value formed by connecting watermark decimal and integer, and digit [0: position ] represents 1 st bit to 1 st bit from left to right of decimal part digit; position + sub _ len: end represents the left-to-right position + sub _ len +1 bit to the last bit of the fractional part decimall; and | represents the concatenation of the character strings.
14. The digital watermark data tracing method based on attribute importance index according to claim 1, wherein:
in the step 6, traversing each piece of data in the data set to be traced, finding the non-important attribute of each piece of data by using the method in the step 3, taking the integer part and the fractional part of the non-important attribute value, calculating the embedding position of the watermark, and extracting all the sub-watermarks to connect the sub-watermarks to form the complete watermark.
CN202110996040.6A 2021-08-27 2021-08-27 Digital watermark data tracing method based on attribute importance index Pending CN113704709A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110996040.6A CN113704709A (en) 2021-08-27 2021-08-27 Digital watermark data tracing method based on attribute importance index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110996040.6A CN113704709A (en) 2021-08-27 2021-08-27 Digital watermark data tracing method based on attribute importance index

Publications (1)

Publication Number Publication Date
CN113704709A true CN113704709A (en) 2021-11-26

Family

ID=78655989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110996040.6A Pending CN113704709A (en) 2021-08-27 2021-08-27 Digital watermark data tracing method based on attribute importance index

Country Status (1)

Country Link
CN (1) CN113704709A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241576A (en) * 2020-01-03 2020-06-05 南京邮电大学 Zero watermark method for distribution protection of database
US20200336895A1 (en) * 2019-04-22 2020-10-22 Afero, Inc. System and method for internet of things (iot) device validation
CN112307741A (en) * 2020-12-31 2021-02-02 北京邮电大学 Insurance industry document intelligent analysis method and device
CN112800394A (en) * 2021-01-25 2021-05-14 南京邮电大学 Security database watermark construction method based on clustering weighting multidimensional bucket grouping

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200336895A1 (en) * 2019-04-22 2020-10-22 Afero, Inc. System and method for internet of things (iot) device validation
CN111241576A (en) * 2020-01-03 2020-06-05 南京邮电大学 Zero watermark method for distribution protection of database
CN112307741A (en) * 2020-12-31 2021-02-02 北京邮电大学 Insurance industry document intelligent analysis method and device
CN112800394A (en) * 2021-01-25 2021-05-14 南京邮电大学 Security database watermark construction method based on clustering weighting multidimensional bucket grouping

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩银丹: ""基于混沌和散列函数的数据库水印算法研究"", 中国优秀硕士学位论文全文数据库, no. 02, 15 February 2015 (2015-02-15), pages 1 - 55 *

Similar Documents

Publication Publication Date Title
Luo et al. Coverless image steganography based on multi-object recognition
Li et al. Tamper detection and localization for categorical data using fragile watermarks
CN107992727B (en) Watermark processing and data tracing method based on original data deformation
US20060095775A1 (en) Fragile watermarks
Imamoglu et al. A new reversible database watermarking approach with firefly optimization algorithm
CN105512523B (en) The digital watermark embedding and extracting method of a kind of anonymization
CN110770725B (en) Data processing method and device
CN114356919A (en) Watermark embedding method, tracing method and device for structured database
Shah et al. Semi-fragile watermarking scheme for relational database tamper detection
US11983789B1 (en) Generation method, detection method, generation device, and detection device of zero watermarking for trajectory data, and storage medium
Ji et al. The curse of correlations for robust fingerprinting of relational databases
KR20010075944A (en) Apparatus and method for inserting &amp;extracting images watermark based on image segmentation in spatial domain
CN116757909B (en) BIM data robust watermarking method, device and medium
CN113704709A (en) Digital watermark data tracing method based on attribute importance index
Shen et al. Relational database watermarking for data tracing
Lohegaon A robust, distortion minimization fingerprinting technique for relational database
Shah et al. Query preserving relational database watermarking
Wang An efficient multiple-bit reversible data hiding scheme without shifting
Xiao et al. Second-LSB-dependent robust watermarking for relational database
Rahmani et al. High hiding capacity steganography method based on pixel indicator technique
Halboos et al. Improve steganography system using agents software based on statistical and classification technique
CN104866737B (en) A kind of DEM fragile watermark completeness certification methods for taking features of terrain into account
CN111177786A (en) Database watermarking system based on random response mechanism local differential privacy
CN115134142B (en) Information hiding method and system based on file segmentation
CN111091283A (en) Power data fingerprint evaluation method based on Bayesian network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination